Python 如何对同一数据帧进行求和、平均、计数分组和标准偏差?
我试图按“名称”和“站点”对这个数据框进行分组,我想创建4个新列,用于查找“花费”列的总和、计数groupby、平均值和标准偏差 以下是我目前的代码:Python 如何对同一数据帧进行求和、平均、计数分组和标准偏差?,python,pandas,Python,Pandas,我试图按“名称”和“站点”对这个数据框进行分组,我想创建4个新列,用于查找“花费”列的总和、计数groupby、平均值和标准偏差 以下是我目前的代码: import pandas as pd df=pd.DataFrame({'Name':['Harry','John','Holly','John','John','John','Holly','Holly','Molly','Molly','Holly','Harry','Harry','Harry'], 'Spend': [76,43,23
import pandas as pd
df=pd.DataFrame({'Name':['Harry','John','Holly','John','John','John','Holly','Holly','Molly','Molly','Holly','Harry','Harry','Harry'], 'Spend': [76,43,23,43,234,54,34,12,43,54,65,23,12,32],
'Site': ['Amazon','Ikea','Apple','Amazon', 'Apple', 'Ikea', 'Apple', 'Apple', 'Amazon', 'Amazon', 'Ikea', 'Amazon', 'Amazon', 'Ikea']})
print (df)
当前,我的数据帧如下所示:
我希望它看起来像这样:
我该怎么做呢
提前谢谢
编辑2018年11月10日:
代码:
之前:
之后:
根据您的编辑,您可以通过传递列上键入的字典来使用agg
,该字典的值是应用于这些列的函数:
df_summary = df.groupby(['Name', 'Site']).agg(
{'Spend': [np.sum, pd.Series.count],
'Spend2': [np.mean, np.std]}
)
df_summary.columns = ['Sum_Spend', 'CountGroupbys_Spend', 'Average_Spend2', 'Standard_Deviation_Spend2']
df_summary = df_summary.reset_index().sort_values(['Site', 'Name'])
>>> df_summary
Name Site Sum_Spend CountGroupbys_Spend Average_Spend2 Standard_Deviation_Spend2
0 Harry Amazon 111 3 370.333333 174.081399
4 John Amazon 43 1 143.000000 NaN
7 Molly Amazon 97 2 198.500000 78.488853
2 Holly Apple 69 3 123.000000 11.000000
5 John Apple 234 1 1234.000000 NaN
1 Harry Ikea 32 1 632.000000 NaN
3 Holly Ikea 65 1 365.000000 NaN
6 John Ikea 97 2 148.500000 7.778175
可能是重复的。谢谢你的回答!我忘了添加另一个关键部分。如果我只想找到两个特定列的花费和计数,然后找到另一个列的平均值和标准偏差。我该怎么做?我已经在“编辑10/11/18”的原始问题中给出了我的代码和前后的示例:“提前感谢,为错过这一点道歉!
df_summary = df.groupby(['Name', 'Site']).agg([np.sum, pd.Series.count, np.mean, np.std])
df_summary.columns = ['Sum', 'Count Groupbys', 'Average', 'Standard Deviation']
df_summary = df_summary.reset_index().sort_values(['Site', 'Name'])
>>> df_summary
Name Site Sum Count Groupbys Average Standard Deviation
0 Harry Amazon 111 3 37.0 34.219877
4 John Amazon 43 1 43.0 NaN
7 Molly Amazon 97 2 48.5 7.778175
2 Holly Apple 69 3 23.0 11.000000
5 John Apple 234 1 234.0 NaN
1 Harry Ikea 32 1 32.0 NaN
3 Holly Ikea 65 1 65.0 NaN
6 John Ikea 97 2 48.5 7.778175
df_summary = df.groupby(['Name', 'Site']).agg(
{'Spend': [np.sum, pd.Series.count],
'Spend2': [np.mean, np.std]}
)
df_summary.columns = ['Sum_Spend', 'CountGroupbys_Spend', 'Average_Spend2', 'Standard_Deviation_Spend2']
df_summary = df_summary.reset_index().sort_values(['Site', 'Name'])
>>> df_summary
Name Site Sum_Spend CountGroupbys_Spend Average_Spend2 Standard_Deviation_Spend2
0 Harry Amazon 111 3 370.333333 174.081399
4 John Amazon 43 1 143.000000 NaN
7 Molly Amazon 97 2 198.500000 78.488853
2 Holly Apple 69 3 123.000000 11.000000
5 John Apple 234 1 1234.000000 NaN
1 Harry Ikea 32 1 632.000000 NaN
3 Holly Ikea 65 1 365.000000 NaN
6 John Ikea 97 2 148.500000 7.778175