Python 我可以用groupy和熊猫计算百分比吗
我有两个问题:首先,我有这个数据框架:Python 我可以用groupy和熊猫计算百分比吗,python,pandas,Python,Pandas,我有两个问题:首先,我有这个数据框架: data = {'Name':['A', 'B', 'C', 'A','D','E','A','C','A','A','A'], 'Family':['B1','B','B','B3','B','B','B','B1','B','B3','B'], 'Region':['North', 'South', 'East', 'West','South', 'East', 'West','North','East', 'West','South']
data = {'Name':['A', 'B', 'C', 'A','D','E','A','C','A','A','A'], 'Family':['B1','B','B','B3','B','B','B','B1','B','B3','B'],
'Region':['North', 'South', 'East', 'West','South', 'East', 'West','North','East', 'West','South'],
'Cod':['1','2','2','1','5','1','1','1','2','1','3'], 'Customer number':['A111','A223','A555','A333','A333','A444','A222','A111','A222','A333','A221']
,'Sales':[100,134,53,34,244,789,213,431,0,55,23]}
我想要一个列,它返回groupby中其他列的销售额百分比,如下图所示:
第二点是,如果百分比为0%(如第一行),我希望根据标准使用相同的结果,例如(如果A222为0%,则使用A221的结果)。我认为这是您想要的:
import pandas as pd
df = pd.DataFrame(data)
granular_sum_df = df.groupby(['Name', 'Family', 'Region', 'Cod', 'Customer number'])['Sales'].sum().reset_index()
family_sum_df = df.groupby(['Name', 'Family'])['Sales'].sum().reset_index()
final_df = granular_sum_df.merge(family_sum_df, on=['Name', 'Family'])
final_df['Pct'] = final_df['Sales_x']/final_df['Sales_y']
第一个问题的答案可能是:
#step 1 Import pandas
import pandas as pd
df=pd.DataFrame(data)
# step 2 printing the dataframe
df
# step 3 Calculating the pecentage:
df['percentage of sales'] = (df['Sales'] / df['Sales'].sum())*100
# step 4 :joining this table percentage to the main dataframe
pd.concat([df, df[['percentage of sales ']]], axis=1, sort=False)
问题2的答案:这取决于你想做什么
概括逻辑:
这是一种方式
但回答问题1和问题2的简单方法是将数据帧转换为numpy数组
然后执行该操作,然后将其带回dataframe。
1.
检查以下答案:
注:我错加了两次。当然,还有其他更简单的方法,我希望这会有所帮助
一些答案:我曾经:
#Converting the percentage column to numpy array
npprices=df['percentage'].to_numpy()
npprices
#loop through the rows and fill the row next row with value from previous row, ASSUMING previous row is not zero.
for i in range(len(npprices)):
if npprices[i]==0:
npprices[i]=npprices[i-1]
#converting in to dataframe back
percentage1=pd.DataFrame({'percentage2':npprices})
# the joing this percentage row to to dataframe
df2i=pd.concat([df, percentage1[['percentage2']]], axis=1, sort=False)