Python 如何对最终输出的数据进行排序?

Python 如何对最终输出的数据进行排序?,python,pandas,dataframe,sorting,pandas-groupby,Python,Pandas,Dataframe,Sorting,Pandas Groupby,我想按两列对数据帧进行分组,然后在组内对聚合结果进行排序 [167]中的:df count job source 0 2 sales A 1 4 sales B 2 6 sales C 3 3 sales D 4 7 sales E 5 5 market A 6 3 market B 7 2 market C 8 4 market D 9 1 market E 现在,我想在每个组中

我想按两列对数据帧进行分组,然后在组内对聚合结果进行排序

[167]中的
:df

count   job source
0   2   sales   A
1   4   sales   B
2   6   sales   C
3   3   sales   D
4   7   sales   E
5   5   market  A
6   3   market  B
7   2   market  C
8   4   market  D
9   1   market  E
现在,我想在每个组中按降序对count列进行排序。然后只取最上面的三排。要获得类似于:

job     source  count
market  A   5
        D   4
        B   3
sales   E   7
        C   6
        B   4
我想用w.r.t
job
对这个问题进行进一步排序,因此,如果销售计数的总和更大,我希望将数据打印为

job     source  count
sales   E   7
        C   6
        B   4
market  A   5
        D   4
        B   3

我无法获得前5名工作

IIUC,我们可以进一步执行
groupby
并使用
nlargest(3)
获得前n名值

然后,我们可以创建一个有序列表,对顶级值进行排序,并创建一个分类列

s = df.groupby(['job','source']).agg({'count':sum}).groupby(level=0)['count']\
.nlargest(3).reset_index(0,drop=True).to_frame()


# see which of your indices is higher and create a sorting list.

sorter = s.groupby(level=0)['count'].sum().sort_values(ascending=False).index
#Index(['sales', 'market'], dtype='object', name='job')

s['sort'] = pd.Categorical(s.index.get_level_values(0),sorter)


df2 = s.sort_values('sort').drop('sort',axis=1)

print(df2)

               count
job    source       
sales  E           7
       C           6
       B           4
market A           5
       D           4
       B           3

您可以使用另一个类似答案中提到的
sort_值
,然后再次使用
group
by
job
从job中获得前N名,如

>>> df
   count     job source
0      2   sales      A
1      4   sales      B
2      6   sales      C
3      3   sales      D
4      7   sales      E
5      5  market      A
6      3  market      B
7      2  market      C
8      4  market      D
9      1  market      E
>>> agg = df.groupby(['job','source']).agg({'count':sum})
>>> agg
               count
job    source       
market A           5
       B           3
       C           2
       D           4
       E           1
sales  A           2
       B           4
       C           6
       D           3
       E           7
>>> agg.reset_index().sort_values(['job', 'count'], ascending=False).set_index(['job', 'source']).groupby('job').head(3)
               count
job    source       
sales  E           7
       C           6
       B           4
market A           5
       D           4
       B           3
>>> 

请分享你迄今为止所做的尝试。非常感谢你的帮助,它确实达到了我想要的效果。
s = df.groupby(['job','source']).agg({'count':sum}).groupby(level=0)['count']\
.nlargest(3).reset_index(0,drop=True).to_frame()


# see which of your indices is higher and create a sorting list.

sorter = s.groupby(level=0)['count'].sum().sort_values(ascending=False).index
#Index(['sales', 'market'], dtype='object', name='job')

s['sort'] = pd.Categorical(s.index.get_level_values(0),sorter)


df2 = s.sort_values('sort').drop('sort',axis=1)

print(df2)

               count
job    source       
sales  E           7
       C           6
       B           4
market A           5
       D           4
       B           3
>>> df
   count     job source
0      2   sales      A
1      4   sales      B
2      6   sales      C
3      3   sales      D
4      7   sales      E
5      5  market      A
6      3  market      B
7      2  market      C
8      4  market      D
9      1  market      E
>>> agg = df.groupby(['job','source']).agg({'count':sum})
>>> agg
               count
job    source       
market A           5
       B           3
       C           2
       D           4
       E           1
sales  A           2
       B           4
       C           6
       D           3
       E           7
>>> agg.reset_index().sort_values(['job', 'count'], ascending=False).set_index(['job', 'source']).groupby('job').head(3)
               count
job    source       
sales  E           7
       C           6
       B           4
market A           5
       D           4
       B           3
>>>