Python Groupby列和排序值按每个组的行数递减_Python_Pandas_Pandas Groupby

Python Groupby列和排序值按每个组的行数递减

python pandas

Python Groupby列和排序值按每个组的行数递减,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个数据帧（df），如下所示： cluster city category latitude longitude merchant 0 0 sanfran 10 39.072 -101.93253 merch2 1 0 sanfran 10 45.072 -110.93253 merch10 2 1 wichita 22 20.072

我有一个数据帧（df），如下所示：

 cluster    city    category    latitude    longitude   merchant
0   0   sanfran       10          39.072    -101.93253  merch2
1   0   sanfran       10          45.072    -110.93253  merch10
2   1   wichita       22          20.072    -100.93253  merch3
3   3   wichita        5          34.072    -99.93253   merch3
4   2   denver         1          40.072    -101.93253  merch1
5   1   longmont       4          30.072    -111.93253  merch2
6   1   longmont       4          30.072    -111.93253  merch2
7   3   topeka         5          20.072    -109.93253  merch1

我想进入数据帧“dfout”

  cluster   merchant    latitude    longitude   city    category
0   0        merch10    45.072  -110.93253      sanfran   10
1   0        merch2     39.072  -101.93253      sanfran   10
2   1        merch2     30.072  -111.93253      longmont    4
3   1        merch3     20.070  -100.93253       wichita    22
4   2        merch1     40.072  -101.93253       denver     1
5   3        merch1     20.072  -109.93253        topeka    5
6   3        merch3     34.072  -99.93253        wichita    5

基本上，我想按集群和集群内的商户进行分组，计算每个集群商户组合的行数，并按从高到低的值对其进行排序，即特定集群中行数最高的商户位于顶部，其余商户根据行数进行排序

我可以使用grouby语句完成以下任务，但这正是我被卡住的地方

df.groupby(['cluster','merchant']).count().sort_values(by='city',ascending=False)

                       city category    latitude    longitude
cluster merchant                
   1     merch2         2       2         2          2
   0    merch10         1       1         1          1
        merch2          1       1         1          1
   1    merch3          1       1         1          1
   2    merch1          1       1         1          1
   3    merch1          1       1        1           1
        merch3          1       1        1           1

有人能解释一下这个问题吗？我怎样才能从df到dfout

谢谢

这将为您提供所需的输出：

您需要将

.agg

与您的

.groupby

一起使用，并对所有列使用

first（）

函数，但需要进行

计数的列除外（稍后用于排序。我使用了集群
列）


由于我在.groupby
中两次使用了集群
列，因此我还必须在.reset_index（）
之前对其进行重命名，否则在重置索引并将集群
放入数据帧的列时，如果有两列同名，则会出现错误
按照标准cluster
和cluster\u count
进行排序，并通过ascending=[True，False]
，以便可以对前者进行升序排序，对后者进行降序排序。最后，删除cluster\u count
列

df_out = df.copy()
df_out = (df_out.groupby(['cluster', 'merchant']).agg({'latitude' : 'first',
                                                  'longitude' : 'first',
                                                  'city' : 'first',
                                                  'category' : 'first',
                                                  'cluster' : 'count'})
          .rename({'cluster' : 'cluster_count'},axis=1).reset_index()
          .sort_values(['cluster', 'cluster_count'], ascending = [True, False])
          .drop('cluster_count', axis=1))
df_out
Out[1]: 
   cluster merchant  latitude  longitude      city  category
0        0  merch10    45.072 -110.93253   sanfran        10
1        0   merch2    39.072 -101.93253   sanfran        10
2        1   merch2    30.072 -111.93253  longmont         4
3        1   merch3    20.072 -100.93253   wichita        22
4        2   merch1    40.072 -101.93253    denver         1
5        3   merch1    20.072 -109.93253    topeka         5