Python 熊猫：按升序对每组的行数进行排序_Python_Python 3.x_Pandas_Sorting

Python 熊猫：按升序对每组的行数进行排序

python python-3.x pandas sorting

Python 熊猫：按升序对每组的行数进行排序,python,python-3.x,pandas,sorting,Python,Python 3.x,Pandas,Sorting,我有一个df如下： In [257]: df Out[257]: user_id col1 col2 0 1 A 4.00000 1 1 A 22.00000 2 1 A 112.00000 3 1 B -0.22222 4 1 B 9.00000 5 1 C 0.00000 6 2 A -1.00000

我有一个

df

如下：

In [257]: df
Out[257]: 
   user_id col1       col2
0        1    A    4.00000
1        1    A   22.00000
2        1    A  112.00000
3        1    B   -0.22222
4        1    B    9.00000
5        1    C    0.00000
6        2    A   -1.00000
7        2    A   -5.00000
8        2    K        NaN

我使用

Groupby.size

计算每组的行数：

In [258]: df.groupby(['user_id', 'col1'])['col2'].size()
Out[258]: 
user_id  col1
1        A       3
         B       2
         C       1
2        A       2
         K       1
Name: col2, dtype: int64

目前，上述输出为

desc

顺序。是否有一种pandaic方法可以按

asc

顺序获取输出

预期输出：

user_id  col1
1        C       1
         B       2
         A       3
2        K       1
         A       2

您需要使用

系列

的值按一级排序，这里是一列

数据帧

的解决方案，以及使用

col2

按一级

用户id

排序的解决方案，最后是

系列

选择

col2

：

s = df.groupby(['user_id', 'col1'])['col2'].size()

s = s.to_frame().sort_values(['user_id', 'col2'])['col2']
print (s)
user_id  col1
1        C       1
         B       2
         A       3
2        K       1
         A       2
Name: col2, dtype: int64

另一个想法是使用

groupby

，如果更大的数据帧应该更慢：

s = df.groupby(['user_id', 'col1'])['col2'].size()

s = s.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values())
print (s)
user_id  col1
1        C       1
         B       2
         A       3
2        K       1
         A       2
Name: col2, dtype: int64

您需要使用

系列

的值按一级排序，这里是一列

数据帧

的解决方案，以及使用

col2

按一级

用户id

排序的解决方案，最后是

系列

选择

col2

：

s = df.groupby(['user_id', 'col1'])['col2'].size()

s = s.to_frame().sort_values(['user_id', 'col2'])['col2']
print (s)
user_id  col1
1        C       1
         B       2
         A       3
2        K       1
         A       2
Name: col2, dtype: int64

另一个想法是使用

groupby

，如果更大的数据帧应该更慢：

s = df.groupby(['user_id', 'col1'])['col2'].size()

s = s.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values())
print (s)
user_id  col1
1        C       1
         B       2
         A       3
2        K       1
         A       2
Name: col2, dtype: int64

最近的一次是

df.groupby(['user_id', 'col1'])['col2'].size().to_frame().sort_index(ascending=False)

为什么不在分组之前对数据帧进行排序呢？这行不通，因为我不想对任何现有列进行排序。我想对

Groupby.size

的输出进行排序。为什么不在分组之前对数据帧进行排序？这不起作用，因为我不想对任何现有列进行排序。我想对

Groupby.size

的输出进行排序。这里的

是什么？

s=df.Groupby（['user\u id'，col1']）['col2'].size（）

这里的

是什么？

s=df.Groupby（['user\u id'，col1']）['col2']）。size（）

不幸的是，这只在样本数据中起作用，通常不起作用，因为不按计数排序，只有通过

MultiiIndex

在样本数据中将

替换为

和

替换为

时，才能看到它，然后解决方案失败。不幸的是，这只在样本数据中工作，通常不会，因为没有按计数排序，只有通过

MultiiIndex

才能看到它，如果在样本数据中将

替换为

和

替换为

，则解决方案失败。