Python 如何选择两组中的前N个，并将第二组的其余部分聚合为；“其他”；和熊猫在一起？_Python_Pandas_Pandas Groupby

Python 如何选择两组中的前N个，并将第二组的其余部分聚合为；“其他”；和熊猫在一起？

python pandas

Python 如何选择两组中的前N个，并将第二组的其余部分聚合为；“其他”；和熊猫在一起？,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个包含产品、价格、类别和县的数据集。我使用此代码计算每个县每个类别的产品数量： df_count = df.groupby(['County','Category']).size().reset_index(name='counts') 我的数据帧现在看起来如下所示：县类别计数 0 布莱金厄省配件及手表 35 1. 布莱金厄省音频和视频 101 2. 布莱金厄省自行车 78 3. 布莱金厄省船配件 65 4. 布莱金厄省船 143 ... ... ... ... 657

我有一个包含产品、价格、类别和县的数据集。我使用此代码计算每个县每个类别的产品数量：

df_count = df.groupby(['County','Category']).size().reset_index(name='counts')

我的数据帧现在看起来如下所示：

县类别计数 0 布莱金厄省配件及手表 35 1. 布莱金厄省音频和视频 101 2. 布莱金厄省自行车 78 3. 布莱金厄省船配件 65 4. 布莱金厄省船 143 ... ... ... ... 657 Östergötland 摩托雪橇配件 2. 658 Östergötland 雪地摩托 5. 659 Östergötland 运动休闲设备 335 660 Östergötland 工具 102 661 Östergötland 卡车与建筑 66

您可以使用以下步骤序列来获得最终输出，我相信这是相当简单的

为了便于理解，我将在代码中添加注释和每行的输出

# Grab top 2 largest caterogies of each country
top_two = df.groupby('County').apply(lambda x: x.nlargest(2, 'counts')).reset_index(drop=True)  

>>> top_two
         County                    Category  counts
0      Blekinge                       Boats     143
1      Blekinge               Audio & video     101
2  Östergötland  Sports & leisure equipment     335
3  Östergötland                       Tools     102

# Create a dataframe with the rest of the information
df_others = df.append(df.merge(top_two,'inner')).drop_duplicates(keep=False)

>>> df_others
         County                        Category  counts
0      Blekinge           Accessories & watches      35
2      Blekinge                        Bicycles      78
3      Blekinge        Boat parts & accessories      65
5  Östergötland  Snowmobile parts & accessories       2
6  Östergötland                     Snowmobiles       5
9  Östergötland           Trucks & construction      66

# Groupby country and Sum and assign 'others' under Category in the df_others dataframe
df_others = df_others.groupby('County')['counts'].sum().reset_index()
df_others['Category'] = 'Others'

>>> df_others
         County  counts Category
0      Blekinge     178   Others
1  Östergötland      73   Others

最后，

concat（）

获取最终输出的两个数据帧：

res = pd.concat([top_two,df_others]).sort_values('County').reset_index(drop=True)
>>> res
         County                    Category  counts
0      Blekinge                       Boats     143
1      Blekinge               Audio & video     101
2      Blekinge                      Others     178
3  Östergötland  Sports & leisure equipment     335
4  Östergötland                       Tools     102
5  Östergötland                      Others      73

如果有不清楚的地方，请返回。

您可以使用

iloc

和

pd.concat

：

df = df.sort_values(['County', 'counts'], ascending=False)
result = (
    df.groupby('County').apply(
        lambda x: pd.concat(
            [x.iloc[:2],
             x.iloc[2:].groupby('County', as_index=False)
             .agg({'counts': sum})
             .assign(Category='Others')]))
    .reset_index(drop=True)
)

输出：

         County                    Category  counts
0      Blekinge                       Boats     143
1      Blekinge               Audio & video     101
2      Blekinge                      Others     178
3  Östergötland  Sports & leisure equipment     335
4  Östergötland                       Tools     102
5  Östergötland                      Others      73

请提供一个带有预期输出的小样本数据框。该解决方案运行良好。感谢您花时间@sophocles！这个解决方案很有效。它给出的结果按降序排列，其他结果排在每个县的最后一行。非常感谢。