Python 如何在Pandas dataframe(由pivot_表生成)中获得每个多索引的前2名
我想显示3级索引数据帧的前2级的前2个结果(通过pivot_表) 问题1:如何在每年-每月的组合中仅获取前两个配置文件? 所以Python 如何在Pandas dataframe(由pivot_表生成)中获得每个多索引的前2名,python,pandas,Python,Pandas,我想显示3级索引数据帧的前2级的前2个结果(通过pivot_表) 问题1:如何在每年-每月的组合中仅获取前两个配置文件? 所以 截止日期:2015年,1:D&A 截止日期:2015年,2:C&B 截止日期:2015年,3:A&C 奖金问题: 如何获得非前2名个人资料的总和并称之为“其他” 所以 对于:2015,1:Other,0,50,10,60(即B&C之和) 对于:2015,2:Other,30,0,0,30(仅在本例中为A) 对于:2015,3:Other,0,0,10,10(仅在本
- 截止日期:2015年,1:D&A
- 截止日期:2015年,2:C&B
- 截止日期:2015年,3:A&C
- 对于:2015,1:Other,0,50,10,60(即B&C之和)
- 对于:2015,2:Other,30,0,0,30(仅在本例中为A)
- 对于:2015,3:Other,0,0,10,10(仅在本例中为B)
我希望将其作为数据帧返回给我更新: 不旋转:
In [120]: srt = df.sort_values(['year','month','profile'])
In [123]: srt[srt.groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[123]:
year month profile ranking sales
0 2015 1 A R1 70
6 2015 1 B R2 50
4 2015 2 A R1 30
1 2015 2 B R2 40
5 2015 3 A R3 20
8 2015 3 B R3 10
In [109]: pvt = df.pivot_table(values = 'sales',
.....: index = ['year','month','profile'],
.....: columns = ['ranking'],
.....: aggfunc = 'sum',
.....: fill_value = 0,
.....: margins = True).reset_index()
In [111]: pvt
Out[111]:
ranking year month profile R1 R2 R3 All
0 2015 1 A 70 0 0 70
1 2015 1 B 0 50 0 50
2 2015 1 C 0 0 10 10
3 2015 1 D 0 90 0 90
4 2015 2 A 30 0 0 30
5 2015 2 B 0 40 0 40
6 2015 2 C 90 0 0 90
7 2015 3 A 0 0 20 20
8 2015 3 B 0 0 10 10
9 2015 3 C 0 0 20 20
10 All 190 180 60 430
使用数据透视:您可以尝试在数据透视后重置索引:
In [120]: srt = df.sort_values(['year','month','profile'])
In [123]: srt[srt.groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[123]:
year month profile ranking sales
0 2015 1 A R1 70
6 2015 1 B R2 50
4 2015 2 A R1 30
1 2015 2 B R2 40
5 2015 3 A R3 20
8 2015 3 B R3 10
In [109]: pvt = df.pivot_table(values = 'sales',
.....: index = ['year','month','profile'],
.....: columns = ['ranking'],
.....: aggfunc = 'sum',
.....: fill_value = 0,
.....: margins = True).reset_index()
In [111]: pvt
Out[111]:
ranking year month profile R1 R2 R3 All
0 2015 1 A 70 0 0 70
1 2015 1 B 0 50 0 50
2 2015 1 C 0 0 10 10
3 2015 1 D 0 90 0 90
4 2015 2 A 30 0 0 30
5 2015 2 B 0 40 0 40
6 2015 2 C 90 0 0 90
7 2015 3 A 0 0 20 20
8 2015 3 B 0 0 10 10
9 2015 3 C 0 0 20 20
10 All 190 180 60 430
现在您可以使用rank()
方法:
In [110]: pvt[pvt.sort_values(['year','month','profile']).groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[110]:
ranking year month profile R1 R2 R3 All
0 2015 1 A 70 0 0 70
1 2015 1 B 0 50 0 50
4 2015 2 A 30 0 0 30
5 2015 2 B 0 40 0 40
7 2015 3 A 0 0 20 20
8 2015 3 B 0 0 10 10
10 All 190 180 60 430
更新: 不旋转:
In [120]: srt = df.sort_values(['year','month','profile'])
In [123]: srt[srt.groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[123]:
year month profile ranking sales
0 2015 1 A R1 70
6 2015 1 B R2 50
4 2015 2 A R1 30
1 2015 2 B R2 40
5 2015 3 A R3 20
8 2015 3 B R3 10
In [109]: pvt = df.pivot_table(values = 'sales',
.....: index = ['year','month','profile'],
.....: columns = ['ranking'],
.....: aggfunc = 'sum',
.....: fill_value = 0,
.....: margins = True).reset_index()
In [111]: pvt
Out[111]:
ranking year month profile R1 R2 R3 All
0 2015 1 A 70 0 0 70
1 2015 1 B 0 50 0 50
2 2015 1 C 0 0 10 10
3 2015 1 D 0 90 0 90
4 2015 2 A 30 0 0 30
5 2015 2 B 0 40 0 40
6 2015 2 C 90 0 0 90
7 2015 3 A 0 0 20 20
8 2015 3 B 0 0 10 10
9 2015 3 C 0 0 20 20
10 All 190 180 60 430
使用数据透视:您可以尝试在数据透视后重置索引:
In [120]: srt = df.sort_values(['year','month','profile'])
In [123]: srt[srt.groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[123]:
year month profile ranking sales
0 2015 1 A R1 70
6 2015 1 B R2 50
4 2015 2 A R1 30
1 2015 2 B R2 40
5 2015 3 A R3 20
8 2015 3 B R3 10
In [109]: pvt = df.pivot_table(values = 'sales',
.....: index = ['year','month','profile'],
.....: columns = ['ranking'],
.....: aggfunc = 'sum',
.....: fill_value = 0,
.....: margins = True).reset_index()
In [111]: pvt
Out[111]:
ranking year month profile R1 R2 R3 All
0 2015 1 A 70 0 0 70
1 2015 1 B 0 50 0 50
2 2015 1 C 0 0 10 10
3 2015 1 D 0 90 0 90
4 2015 2 A 30 0 0 30
5 2015 2 B 0 40 0 40
6 2015 2 C 90 0 0 90
7 2015 3 A 0 0 20 20
8 2015 3 B 0 0 10 10
9 2015 3 C 0 0 20 20
10 All 190 180 60 430
现在您可以使用rank()
方法:
In [110]: pvt[pvt.sort_values(['year','month','profile']).groupby(['year','month'])['profile'].rank(method='min') <= 2]
Out[110]:
ranking year month profile R1 R2 R3 All
0 2015 1 A 70 0 0 70
1 2015 1 B 0 50 0 50
4 2015 2 A 30 0 0 30
5 2015 2 B 0 40 0 40
7 2015 3 A 0 0 20 20
8 2015 3 B 0 0 10 10
10 All 190 180 60 430
对于年、月和概要文件的每个组合,示例数据只有一个值。你的真实数据也是这样吗?如果你能为“问题1”和“奖金问题”发布一个预期输出,你的样本数据在年、月和个人资料的每个组合中只有一个值,这将非常有帮助。你的真实数据也是这样吗?如果你能发布“问题1”和“奖金问题”的预期结果,那将非常有帮助