Groupby计算,以及pandas中的轴函数
我有一个如下所示的数据框Groupby计算,以及pandas中的轴函数,pandas,pandas-groupby,Pandas,Pandas Groupby,我有一个如下所示的数据框 Sector Plot Year Amount Month SE1 1 2017 10 Sep SE1 1 2018 10 Oct SE1 1 2019 10 Jun SE1 1 2020 90 Feb SE1 2 2018
Sector Plot Year Amount Month
SE1 1 2017 10 Sep
SE1 1 2018 10 Oct
SE1 1 2019 10 Jun
SE1 1 2020 90 Feb
SE1 2 2018 50 Jan
SE1 2 2017 100 May
SE1 2 2018 30 Oct
SE2 2 2018 50 Mar
SE2 2 2019 100 Jan
从上面我想准备下面
Sector Plot Number_of_Times Mean_Amount Recent_Amount Recent_year All
SE1 1 4 30 50 2020 {'2018':50, '2017':10, '2019':10, 2020:90}
SE1 2 3 60 30 2018 {'2018':50, '2017':100, '2018':30}
SE2 2 2 75 100 2019 {'2018':50, '2019':100}
与df1
的命名聚合一起使用,然后通过将Year
s与Amount
转换为每个组创建字典,并最终合并:
g = df.groupby(['Sector','Plot'])
df1 = (g.agg(Number_of_Times=('Year','size'),
Mean_Amount=('Amount','mean'),
Recent_Amount=('Amount','last'),
Recent_year=('Year','last')))
s = g['Year','Amount'].apply(lambda x: dict(x.values)).rename('All')
关于zip
和dict
的另一个想法:
s = g.apply(lambda x: dict(zip(x['Year'], x['Amount']))).rename('All')
df2 = df1.join(s).reset_index()
print (df2)
Sector Plot Number_of_Times Mean_Amount Recent_Amount Recent_year \
0 SE1 1 4 30 90 2020
1 SE1 2 3 60 30 2018
2 SE2 2 2 75 100 2019
All
0 {2017: 10, 2018: 10, 2019: 10, 2020: 90}
1 {2018: 30, 2017: 100}
2 {2018: 50, 2019: 100}