Python中的数据帧过滤_Python_Pandas

Python中的数据帧过滤

python pandas

Python中的数据帧过滤,python,pandas,Python,Pandas,我有一个数据框，有两列，如下所示： Index Year Country 0 2015 US 1 2015 US 2 2015 UK 3 2015 Indonesia 4 2015 US 5 2016 India 6 2016 India 7 2016 UK Index Year

我有一个数据框，有两列，如下所示：

Index Year        Country
0     2015        US
1     2015        US
2     2015        UK
3     2015        Indonesia
4     2015        US
5     2016        India
6     2016        India
7     2016        UK

    Index      Year      Country     Count
    0          2015      US          3
    1          2016      India       2

我想创建一个新的数据框，其中包含每年的最大国家数。新数据框将包含以下3列：

Index Year        Country
0     2015        US
1     2015        US
2     2015        UK
3     2015        Indonesia
4     2015        US
5     2016        India
6     2016        India
7     2016        UK

    Index      Year      Country     Count
    0          2015      US          3
    1          2016      India       2

pandas中是否有可以快速执行此操作的功能？

使用：

首先按和获取每对

年

和

国家

的计数。然后按获取最大值的索引，并选择行按

loc

：

与和一起使用自定义函数：

只需提供一个没有

groupby

Count=pd.Series(list(zip(df2.Year,df2.Country))).value_counts()
          .head(2).reset_index(name='Count')
Count[['Year','Country']]=Count['index'].apply(pd.Series)
Count.drop('index',1)


Out[266]: 
   Count  Year Country
0      3  2015      US
1      2  2016   India

一种方法是使用

groupby

和

size

查找每个类别中的值，并按可能的年数进行排序和切片。您可以尝试以下操作：

num_year = df['Year'].nunique()
new_df = df.groupby(['Year', 'Country']).size().rename('Count').sort_values(ascending=False).reset_index()[:num_year]

结果:

   Year   Country  Count
0  2015      US      3
1  2016   India      2