Python 如何对最终输出的数据进行排序?
我想按两列对数据帧进行分组,然后在组内对聚合结果进行排序 [167]中的Python 如何对最终输出的数据进行排序?,python,pandas,dataframe,sorting,pandas-groupby,Python,Pandas,Dataframe,Sorting,Pandas Groupby,我想按两列对数据帧进行分组,然后在组内对聚合结果进行排序 [167]中的:df count job source 0 2 sales A 1 4 sales B 2 6 sales C 3 3 sales D 4 7 sales E 5 5 market A 6 3 market B 7 2 market C 8 4 market D 9 1 market E 现在,我想在每个组中
:df
count job source
0 2 sales A
1 4 sales B
2 6 sales C
3 3 sales D
4 7 sales E
5 5 market A
6 3 market B
7 2 market C
8 4 market D
9 1 market E
现在,我想在每个组中按降序对count列进行排序。然后只取最上面的三排。要获得类似于:
job source count
market A 5
D 4
B 3
sales E 7
C 6
B 4
我想用w.r.tjob
对这个问题进行进一步排序,因此,如果销售计数的总和更大,我希望将数据打印为
job source count
sales E 7
C 6
B 4
market A 5
D 4
B 3
我无法获得前5名工作IIUC,我们可以进一步执行
groupby
并使用nlargest(3)
获得前n名值
然后,我们可以创建一个有序列表,对顶级值进行排序,并创建一个分类列
s = df.groupby(['job','source']).agg({'count':sum}).groupby(level=0)['count']\
.nlargest(3).reset_index(0,drop=True).to_frame()
# see which of your indices is higher and create a sorting list.
sorter = s.groupby(level=0)['count'].sum().sort_values(ascending=False).index
#Index(['sales', 'market'], dtype='object', name='job')
s['sort'] = pd.Categorical(s.index.get_level_values(0),sorter)
df2 = s.sort_values('sort').drop('sort',axis=1)
print(df2)
count
job source
sales E 7
C 6
B 4
market A 5
D 4
B 3
您可以使用另一个类似答案中提到的
sort_值
,然后再次使用group
byjob
从job中获得前N名,如
>>> df
count job source
0 2 sales A
1 4 sales B
2 6 sales C
3 3 sales D
4 7 sales E
5 5 market A
6 3 market B
7 2 market C
8 4 market D
9 1 market E
>>> agg = df.groupby(['job','source']).agg({'count':sum})
>>> agg
count
job source
market A 5
B 3
C 2
D 4
E 1
sales A 2
B 4
C 6
D 3
E 7
>>> agg.reset_index().sort_values(['job', 'count'], ascending=False).set_index(['job', 'source']).groupby('job').head(3)
count
job source
sales E 7
C 6
B 4
market A 5
D 4
B 3
>>>
请分享你迄今为止所做的尝试。非常感谢你的帮助,它确实达到了我想要的效果。
s = df.groupby(['job','source']).agg({'count':sum}).groupby(level=0)['count']\
.nlargest(3).reset_index(0,drop=True).to_frame()
# see which of your indices is higher and create a sorting list.
sorter = s.groupby(level=0)['count'].sum().sort_values(ascending=False).index
#Index(['sales', 'market'], dtype='object', name='job')
s['sort'] = pd.Categorical(s.index.get_level_values(0),sorter)
df2 = s.sort_values('sort').drop('sort',axis=1)
print(df2)
count
job source
sales E 7
C 6
B 4
market A 5
D 4
B 3
>>> df
count job source
0 2 sales A
1 4 sales B
2 6 sales C
3 3 sales D
4 7 sales E
5 5 market A
6 3 market B
7 2 market C
8 4 market D
9 1 market E
>>> agg = df.groupby(['job','source']).agg({'count':sum})
>>> agg
count
job source
market A 5
B 3
C 2
D 4
E 1
sales A 2
B 4
C 6
D 3
E 7
>>> agg.reset_index().sort_values(['job', 'count'], ascending=False).set_index(['job', 'source']).groupby('job').head(3)
count
job source
sales E 7
C 6
B 4
market A 5
D 4
B 3
>>>