Python 3.x 分组按熊猫求和,并根据条件/要求定制df
我有一个df,如下所示Python 3.x 分组按熊猫求和,并根据条件/要求定制df,python-3.x,pandas,dataframe,Python 3.x,Pandas,Dataframe,我有一个df,如下所示 ID Type Number_of_transactions Amount 1 SuperMarket 8 2000 2 Hospital 2 500 1 Education 1 1000 1
ID Type Number_of_transactions Amount
1 SuperMarket 8 2000
2 Hospital 2 500
1 Education 1 1000
1 Travel 3 600
1 Hospital 1 800
2 Education 1 600
1 SuperMarket 2 100
2 SuperMarket 4 400
3 SuperMarket 2 300
2 Hospital 3 200
3 Education 1 700
3 Education 1 100
2 Hospital 1 1000
3 Hotel 3 1500
3 Hotel 2 700
4 SuperMarket 10 900
从上面,我想准备下面的数据帧
预期产出:
ID Top1 AmountTop1 Top2 Amount_Top2
1 SuperMarket 2100 Education 1000
2 Hospital 1700 Education 600
3 Hotel 2200 Education 800
4 SuperMarket 900 NaN NaN
说明:
Top1 = Spending type where the customer spend most
AmountTop1 = Amount spend on Top1 type
Top2 = Spending type where the customer spend second largest
Amount_Top2 = Amount spend on Top2 type
我试过下面的代码
df1 = df.groupby(['ID', 'Type']).agg({'Amount':'sum'})\
.rename(columns={'Amount':'Spend_By_Type'})\
.sort_values(by=['ID', 'Spend_By_Type'], ascending=[1, 0])\
.reset_index(drop=False)
[104]中的:x=df.groupby(['ID',Type'],as_index=False)['Amount'].sum()。排序_值(['ID',Amount'],升序=[True,False])。
…:赋值(order=lambda x:x.groupby('ID').cumcount()+1).query('order<3').pivot_表(index='ID',columns=['order'],va
…:lues=['Type','Amount',aggfunc='first')
In[105]:x
出[105]:
金额类型
订单12
身份证件
1210.0 1000.0超市教育
2 1700.0600.0医院教育
3 2200.0 800.0酒店教育
4900.0南超级市场南
然后可以重命名列,并根据需要对其重新排序。这只是计算总和,然后按合计金额排序
然后,我们创建一个列“order”,其中包含已排序df的分组索引,过滤掉每个组的前两项
res = (df.groupby("ID")[["Type", "Amount"]]
.apply(lambda s: s.groupby("Type").sum().nlargest(2, "Amount"))
.reset_index("Type")
.pipe(lambda df: df.set_index(df.groupby("ID").cumcount()+1, append=True))
.unstack())
res.columns = (res.columns.map("{0[0]}{0[1]}".format)
.str.replace("Type", "Top")
.str.replace("Amount", "AmountTop"))
res = res.sort_index(axis=1, key=lambda s: s.str[-1])
我们首先按ID
列分组,然后查看类型
上最大的两组和。然后,我们将类型
放回列中,并附加一个索引,该索引有助于对ID
s中的值进行编号,并取消其堆栈。其余部分是连接和重命名列名及其特定排序
得到
>>> res
Top1 AmountTop1 Top2 AmountTop2
ID
1 SuperMarket 2100.0 Education 1000.0
2 Hospital 1700.0 Education 600.0
3 Hotel 2200.0 Education 800.0
4 SuperMarket 900.0 NaN NaN
在
ID
和Type
上获取初始groupby:
base_values = df.groupby(['ID', 'Type']).Amount.agg('sum')
ID Type
1 Education 1000
Hospital 800
SuperMarket 2100
Travel 600
2 Education 600
Hospital 1700
SuperMarket 400
3 Education 800
Hotel 2200
SuperMarket 300
4 SuperMarket 900
Name: Amount, dtype: int64
通过排名获得Top1
和Top2
:
Top1 = (base_values[base_values.groupby("ID")
.rank(ascending = False)
.eq(1)]
.rename_axis(['ID', 'Top1'])
.rename('AmountTop1')
.reset_index('Top1'))
Top2 = (base_values[base_values.groupby("ID")
.rank(ascending = False)
.eq(2)]
.rename_axis(['ID', 'Top2'])
.rename('AmountTop2')
.reset_index('Top12'))
结合Top1
和Top2
:
pd.concat([Top1, Top2], axis = 'columns')
Top1 AmountTop1 Top2 AmountTop2
ID
1 SuperMarket 2100 Education 1000.0
2 Hospital 1700 Education 600.0
3 Hotel 2200 Education 800.0
4 SuperMarket 900 NaN NaN
您可以使用:
df1 = df.groupby(['ID', 'Type'], as_index=False)['Amount'].sum()
df2 = (df1.loc[df1.groupby('ID')['Amount'].nlargest(2).reset_index(level=0).index]
.rename(columns={'Type': 'Top', 'Amount': 'AmountTop'})
)
df2['top_num'] = df2.groupby('ID').cumcount() + 1
df3 = df2.pivot(index='ID', columns='top_num').sort_index(level=[1,0], ascending=[True, False], axis=1)
df3.columns = df3.columns.map(lambda x: ''.join(map(str, x)))
df3 = df3.reset_index()
结果:
print(df3)
ID Top1 AmountTop1 Top2 AmountTop2
0 1 SuperMarket 2100.0 Education 1000.0
1 2 Hospital 1700.0 Education 600.0
2 3 Hotel 2200.0 Education 800.0
3 4 SuperMarket 900.0 NaN NaN
但还有一家ID=1且金额为的超市100@Danish好吧,也许你想总结一下。我没有抓住那一点。很快就会修好的@丹麦编辑了解决方案。请看一看!谢谢