python/pandas中的条件聚合
我有这样一个数据帧:python/pandas中的条件聚合,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有这样一个数据帧: Amount Month Type 15 201801 Sale 34 201801 Purchase 4 201801 Sale 86 201801 Purchase 23 201802 Sale 55 201802 Purchase 29 201802 Sale ... Month TotalSales TotalSalesRun TotalPurch TotalPurchR
Amount Month Type
15 201801 Sale
34 201801 Purchase
4 201801 Sale
86 201801 Purchase
23 201802 Sale
55 201802 Purchase
29 201802 Sale
...
Month TotalSales TotalSalesRun TotalPurch TotalPurchRun
201801 19 19 120 120
201802 52 71 55 175
我想按月汇总,以便得到:
- 总销售额:总额(类型==销售额的金额)
- 总销售额(运行):总和(金额,其中月份用于合计,按重塑,累计总和为:
索引中的最后一列(如果需要):
df2 = df2.reset_index().rename_axis(None, axis=1) print (df2) Month TotalPurchase TotalPurchaseRun TotalSale TotalSaleRun 0 201801 120 120 19 19 1 201802 55 175 52 71
我正在使用concat
或者让我们使用s1=df.groupby(['Month','Type']).sum() s2=s1.groupby(level=1).cumsum().add_prefix('running') s=pd.concat([s1,s2],axis=1).unstack() s.columns=s.columns.map('_'.join)
pivot\u表
s1=df.pivot_table(index='Month',columns='Type',values='Amount',aggfunc='sum') Yourdf=pd.concat([s1,s1.cumsum().add_prefix('Rolling')],sort=False,axis=1) Yourdf Type Purchase Sale RollingPurchase RollingSale Month 201801 120 19 120 19 201802 55 52 175 71
您可以为此使用groupby。或者,在数据帧切片中使用条件。 比如说,total_sales = sum(df["Amount"][df.Type == 'Sale'])
使用内置的纯python
会很慢。对于矢量化的sum,在末尾使用sum
.sum()
total_sales = sum(df["Amount"][df.Type == 'Sale'])