Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/jquery-ui/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 在每个类型中排名下载应用程序的最高数量,并仅筛选每个类型中排名前2的应用程序_Python 3.x_Pandas_Pandas Groupby - Fatal编程技术网

Python 3.x 在每个类型中排名下载应用程序的最高数量,并仅筛选每个类型中排名前2的应用程序

Python 3.x 在每个类型中排名下载应用程序的最高数量,并仅筛选每个类型中排名前2的应用程序,python-3.x,pandas,pandas-groupby,Python 3.x,Pandas,Pandas Groupby,我正在尝试从下面的数据集中获取下载量最高的前2个应用程序,这是我使用的数据集 import pandas as pd df = pd.DataFrame(data={'genre_id': ['tools', 'tools', 'VIDEO_PLAYERS', 'VIDEO_PLAYERS', 'PHOTOGRAPHY'], 'app_id':['MP3Cutter','Phot

我正在尝试从下面的数据集中获取下载量最高的前2个应用程序,这是我使用的数据集

import pandas as pd
df = pd.DataFrame(data={'genre_id': ['tools', 'tools', 'VIDEO_PLAYERS',
                                   'VIDEO_PLAYERS', 'PHOTOGRAPHY'],
                        'app_id':['MP3Cutter','PhotoCutter','VLC','MXPlayer','Picasa'],
                        'min_installs': [10, 100, 10, 20,1000]})
df
这就是我尝试过的

    df['default_rank'] = df.groupby(['genre_id']).agg(['rank'])
    df.sort_values(by='default_rank')
我得到的输出如下:

   genre_id           app_id    min_installs    default_rank
0   tools            MP3Cutter        10            1.0
2   VIDEO_PLAYERS      VLC            10            1.0
4   PHOTOGRAPHY       Picasa         1000           1.0
1   tools           PhotoCutter       100           2.0
3   VIDEO_PLAYERS     MXPlayer        20            2.0
   genre_id           app_id    min_installs    default_rank
4   PHOTOGRAPHY       Picasa         1000           1.0
1   tools           PhotoCutter       100           1.0
0   tools            MP3Cutter        10            2.0
3   VIDEO_PLAYERS     MXPlayer        20            1.0
2   VIDEO_PLAYERS      VLC            10            2.0
但我想得到这样的东西:

   genre_id           app_id    min_installs    default_rank
0   tools            MP3Cutter        10            1.0
2   VIDEO_PLAYERS      VLC            10            1.0
4   PHOTOGRAPHY       Picasa         1000           1.0
1   tools           PhotoCutter       100           2.0
3   VIDEO_PLAYERS     MXPlayer        20            2.0
   genre_id           app_id    min_installs    default_rank
4   PHOTOGRAPHY       Picasa         1000           1.0
1   tools           PhotoCutter       100           1.0
0   tools            MP3Cutter        10            2.0
3   VIDEO_PLAYERS     MXPlayer        20            1.0
2   VIDEO_PLAYERS      VLC            10            2.0

我是熊猫新手,使用python熊猫可以进行高级数据操作吗?就像我们在SQL中所做的那样?

我相信您需要按照每个组的最大值进行排序,然后使用:

s =  df.groupby('genre_id')['min_installs'].transform('max')

df['default_rank'] = (df.groupby('genre_id')['min_installs']
                        .rank(method='max', ascending=False))

df = (df.assign(m=s)
         .sort_values(by=['m', 'default_rank', 'genre_id'], 
                      ascending=[False, True, True])
         .drop('m', axis=1))
如果需要筛选每组的top2值:

df = df.groupby('genre_id').head(2)

print (df)
        genre_id       app_id  min_installs  default_rank
4    PHOTOGRAPHY       Picasa          1000           1.0
1          tools  PhotoCutter           100           1.0
0          tools    MP3Cutter            10           2.0
3  VIDEO_PLAYERS     MXPlayer            20           1.0
2  VIDEO_PLAYERS          VLC            10           2.0