Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/vue.js/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 分组按熊猫求和,并根据条件/要求定制df_Python 3.x_Pandas_Dataframe - Fatal编程技术网

Python 3.x 分组按熊猫求和,并根据条件/要求定制df

Python 3.x 分组按熊猫求和,并根据条件/要求定制df,python-3.x,pandas,dataframe,Python 3.x,Pandas,Dataframe,我有一个df,如下所示 ID Type Number_of_transactions Amount 1 SuperMarket 8 2000 2 Hospital 2 500 1 Education 1 1000 1

我有一个df,如下所示

ID      Type              Number_of_transactions     Amount
1       SuperMarket       8                          2000
2       Hospital          2                          500
1       Education         1                          1000
1       Travel            3                          600
1       Hospital          1                          800
2       Education         1                          600
1       SuperMarket       2                          100
2       SuperMarket       4                          400
3       SuperMarket       2                          300
2       Hospital          3                          200
3       Education         1                          700
3       Education         1                          100
2       Hospital          1                          1000
3       Hotel             3                          1500
3       Hotel             2                          700
4       SuperMarket       10                         900
从上面,我想准备下面的数据帧

预期产出:

ID       Top1              AmountTop1        Top2          Amount_Top2
1        SuperMarket       2100              Education     1000
2        Hospital          1700              Education     600 
3        Hotel             2200              Education     800
4        SuperMarket       900               NaN           NaN
说明:

Top1 = Spending type where the customer spend most
AmountTop1 = Amount spend on Top1 type
Top2 = Spending type where the customer spend second largest
Amount_Top2 = Amount spend on Top2 type
我试过下面的代码

df1 = df.groupby(['ID', 'Type']).agg({'Amount':'sum'})\
                            .rename(columns={'Amount':'Spend_By_Type'})\
                            .sort_values(by=['ID', 'Spend_By_Type'], ascending=[1, 0])\
                            .reset_index(drop=False)
[104]中的
:x=df.groupby(['ID',Type'],as_index=False)['Amount'].sum()。排序_值(['ID',Amount'],升序=[True,False])。
…:赋值(order=lambda x:x.groupby('ID').cumcount()+1).query('order<3').pivot_表(index='ID',columns=['order'],va
…:lues=['Type','Amount',aggfunc='first')
In[105]:x
出[105]:
金额类型
订单12
身份证件
1210.0 1000.0超市教育
2 1700.0600.0医院教育
3 2200.0 800.0酒店教育
4900.0南超级市场南
然后可以重命名列,并根据需要对其重新排序。这只是计算总和,然后按合计金额排序 然后,我们创建一个列“order”,其中包含已排序df的分组索引,过滤掉每个组的前两项

res = (df.groupby("ID")[["Type", "Amount"]]
         .apply(lambda s: s.groupby("Type").sum().nlargest(2, "Amount"))
         .reset_index("Type")
         .pipe(lambda df: df.set_index(df.groupby("ID").cumcount()+1, append=True))
         .unstack())

res.columns = (res.columns.map("{0[0]}{0[1]}".format)
                  .str.replace("Type", "Top")
                  .str.replace("Amount", "AmountTop"))

res = res.sort_index(axis=1, key=lambda s: s.str[-1])
我们首先按
ID
列分组,然后查看
类型
上最大的两组和。然后,我们将
类型
放回列中,并附加一个索引,该索引有助于对
ID
s中的值进行编号,并取消其堆栈。其余部分是连接和重命名列名及其特定排序

得到

>>> res

           Top1  AmountTop1       Top2  AmountTop2
ID
1   SuperMarket      2100.0  Education      1000.0
2      Hospital      1700.0  Education       600.0
3         Hotel      2200.0  Education       800.0
4   SuperMarket       900.0        NaN         NaN

ID
Type
上获取初始groupby:

base_values = df.groupby(['ID', 'Type']).Amount.agg('sum')

ID  Type       
1   Education      1000
    Hospital        800
    SuperMarket    2100
    Travel          600
2   Education       600
    Hospital       1700
    SuperMarket     400
3   Education       800
    Hotel          2200
    SuperMarket     300
4   SuperMarket     900
Name: Amount, dtype: int64
通过排名获得
Top1
Top2

Top1 = (base_values[base_values.groupby("ID")
                               .rank(ascending = False)
                               .eq(1)]
          .rename_axis(['ID', 'Top1'])
          .rename('AmountTop1')
          .reset_index('Top1'))


Top2 = (base_values[base_values.groupby("ID")
                               .rank(ascending = False)
                               .eq(2)]
          .rename_axis(['ID', 'Top2'])
          .rename('AmountTop2')
          .reset_index('Top12'))
结合
Top1
Top2

pd.concat([Top1, Top2], axis = 'columns')

           Top1  AmountTop1       Top2  AmountTop2
ID                                                
1   SuperMarket        2100  Education      1000.0
2      Hospital        1700  Education       600.0
3         Hotel        2200  Education       800.0
4   SuperMarket         900        NaN         NaN
您可以使用:

df1 = df.groupby(['ID', 'Type'], as_index=False)['Amount'].sum()
df2 = (df1.loc[df1.groupby('ID')['Amount'].nlargest(2).reset_index(level=0).index]
         .rename(columns={'Type': 'Top', 'Amount': 'AmountTop'})
      )
df2['top_num'] = df2.groupby('ID').cumcount() + 1
df3 = df2.pivot(index='ID', columns='top_num').sort_index(level=[1,0],  ascending=[True, False], axis=1)
df3.columns = df3.columns.map(lambda x: ''.join(map(str, x)))
df3 = df3.reset_index()
结果:

print(df3)


   ID         Top1  AmountTop1       Top2  AmountTop2
0   1  SuperMarket      2100.0  Education      1000.0
1   2     Hospital      1700.0  Education       600.0
2   3        Hotel      2200.0  Education       800.0
3   4  SuperMarket       900.0        NaN         NaN

但还有一家ID=1且金额为的超市100@Danish好吧,也许你想总结一下。我没有抓住那一点。很快就会修好的@丹麦编辑了解决方案。请看一看!谢谢