Python 查找数据帧中最常见的事件_Python_Pandas

Python 查找数据帧中最常见的事件

python pandas

Python 查找数据帧中最常见的事件,python,pandas,Python,Pandas,假设我有一个数据框架，我希望将用户与国家关联起来： >>> dfUsers[['userId', 'country', 'lat']].dropna().groupby(['userId', 'country']).agg(len).reset_index() userId country lat 0 1479705782818706665 India 1 1 1480576924651623757

假设我有一个数据框架，我希望将用户与国家关联起来：

>>> dfUsers[['userId', 'country', 'lat']].dropna().groupby(['userId', 'country']).agg(len).reset_index()

                 userId      country  lat
0   1479705782818706665        India    1
1   1480576924651623757        India   12
2   1480576924651623757           РФ    2
3   1480928137574356334     Malaysia   17
4   1480988896538924406        India    1
5   1481723517601846740     Malaysia    2
6   1481810347655435765    Singapore    3
7   1481818704328005112    Singapore    6
8   1482457537889441352    Singapore   18
9   1482488858703566411    Singapore    1
10  1482730123382756957        India    1
11  1483106342385227382    Singapore    2
12  1483316566673069712     Malaysia    4
13  1484507758001657608    Singapore    6
14  1484654275131873053    Singapore    1
15  1484666213119301417    Singapore    1
16  1484734631705057076     Malaysia    4

我想做的是将用户与国家关联起来。在这种情况下，很容易看到用户

1480576924651623757

有两个不同的国家与他/她关联。但是，我想将该用户与

印度

联系起来，因为该用户在印度的次数比他/她在其他国家的次数多

有没有一种简洁的方法可以做到这一点？我总是可以循环使用“userId”，并找到与较大值对应的值。但是，我想知道是否有一种方法可以在不使用循环的情况下执行此操作…

您似乎需要按列

lat

查找每个组的最大索引，然后选择：

lat

列用于计数

user

country

？这只是groupby的一个虚拟列。。。

df = df.loc[df.groupby('userId')['lat'].idxmax()]
print (df)
                 userId    country  lat
0   1479705782818706665      India    1
1   1480576924651623757      India   12 < 12 is max, so India
3   1480928137574356334   Malaysia   17
4   1480988896538924406      India    1
5   1481723517601846740   Malaysia    2
6   1481810347655435765  Singapore    3
7   1481818704328005112  Singapore    6
8   1482457537889441352  Singapore   18
9   1482488858703566411  Singapore    1
10  1482730123382756957      India    1
11  1483106342385227382  Singapore    2
12  1483316566673069712   Malaysia    4
13  1484507758001657608  Singapore    6
14  1484654275131873053  Singapore    1
15  1484666213119301417  Singapore    1
16  1484734631705057076   Malaysia    4

df = dfUsers[['userId', 'country', 'lat']].dropna()
                                          .groupby(['userId', 'country'])
                                          .size()
                                          .reset_index(name='Count')

df = df.loc[df.groupby('userId')['Count'].idxmax()]