Python 3.x 在每个类型中排名下载应用程序的最高数量，并仅筛选每个类型中排名前2的应用程序_Python 3.x_Pandas_Pandas Groupby

Python 3.x 在每个类型中排名下载应用程序的最高数量，并仅筛选每个类型中排名前2的应用程序

python-3.x pandas

Python 3.x 在每个类型中排名下载应用程序的最高数量，并仅筛选每个类型中排名前2的应用程序,python-3.x,pandas,pandas-groupby,Python 3.x,Pandas,Pandas Groupby,我正在尝试从下面的数据集中获取下载量最高的前2个应用程序，这是我使用的数据集 import pandas as pd df = pd.DataFrame(data={'genre_id': ['tools', 'tools', 'VIDEO_PLAYERS', 'VIDEO_PLAYERS', 'PHOTOGRAPHY'], 'app_id':['MP3Cutter','Phot

我正在尝试从下面的数据集中获取下载量最高的前2个应用程序，这是我使用的数据集

import pandas as pd
df = pd.DataFrame(data={'genre_id': ['tools', 'tools', 'VIDEO_PLAYERS',
                                   'VIDEO_PLAYERS', 'PHOTOGRAPHY'],
                        'app_id':['MP3Cutter','PhotoCutter','VLC','MXPlayer','Picasa'],
                        'min_installs': [10, 100, 10, 20,1000]})
df

这就是我尝试过的

    df['default_rank'] = df.groupby(['genre_id']).agg(['rank'])
    df.sort_values(by='default_rank')

我得到的输出如下：

   genre_id           app_id    min_installs    default_rank
0   tools            MP3Cutter        10            1.0
2   VIDEO_PLAYERS      VLC            10            1.0
4   PHOTOGRAPHY       Picasa         1000           1.0
1   tools           PhotoCutter       100           2.0
3   VIDEO_PLAYERS     MXPlayer        20            2.0

   genre_id           app_id    min_installs    default_rank
4   PHOTOGRAPHY       Picasa         1000           1.0
1   tools           PhotoCutter       100           1.0
0   tools            MP3Cutter        10            2.0
3   VIDEO_PLAYERS     MXPlayer        20            1.0
2   VIDEO_PLAYERS      VLC            10            2.0

但我想得到这样的东西：

   genre_id           app_id    min_installs    default_rank
0   tools            MP3Cutter        10            1.0
2   VIDEO_PLAYERS      VLC            10            1.0
4   PHOTOGRAPHY       Picasa         1000           1.0
1   tools           PhotoCutter       100           2.0
3   VIDEO_PLAYERS     MXPlayer        20            2.0

   genre_id           app_id    min_installs    default_rank
4   PHOTOGRAPHY       Picasa         1000           1.0
1   tools           PhotoCutter       100           1.0
0   tools            MP3Cutter        10            2.0
3   VIDEO_PLAYERS     MXPlayer        20            1.0
2   VIDEO_PLAYERS      VLC            10            2.0

我是熊猫新手，使用python熊猫可以进行高级数据操作吗？就像我们在SQL中所做的那样？

我相信您需要按照每个组的最大值进行排序，然后使用：

s =  df.groupby('genre_id')['min_installs'].transform('max')

df['default_rank'] = (df.groupby('genre_id')['min_installs']
                        .rank(method='max', ascending=False))

df = (df.assign(m=s)
         .sort_values(by=['m', 'default_rank', 'genre_id'], 
                      ascending=[False, True, True])
         .drop('m', axis=1))

如果需要筛选每组的top2值：

df = df.groupby('genre_id').head(2)

print (df)
        genre_id       app_id  min_installs  default_rank
4    PHOTOGRAPHY       Picasa          1000           1.0
1          tools  PhotoCutter           100           1.0
0          tools    MP3Cutter            10           2.0
3  VIDEO_PLAYERS     MXPlayer            20           1.0
2  VIDEO_PLAYERS          VLC            10           2.0