Python 巨蟒-如何找到拥有大熊猫的最大群体_Python_Python 3.x_Pandas

Python 巨蟒-如何找到拥有大熊猫的最大群体

python python-3.x pandas

Python 巨蟒-如何找到拥有大熊猫的最大群体,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个分级数据框，其中有一行userId、movieId、rating。我想找到评分最高的用户以下是我编写的代码： import pandas as pd ratings = pd.read_csv('ratings.csv') # userId,movieId,rating user_rating_counts = ratings[['userId','movieId']].groupby('userId')['movieId'].agg(['count']) top_rator = us

我有一个分级数据框，其中有一行

userId、movieId、rating

。我想找到评分最高的用户

以下是我编写的代码：

import pandas as pd
ratings = pd.read_csv('ratings.csv') # userId,movieId,rating
user_rating_counts = ratings[['userId','movieId']].groupby('userId')['movieId'].agg(['count'])
top_rator = user_rating_counts[user_rating_counts['count']==user_rating_counts['count'].max()]

userId,movieId,rating
1,1,4.0
1,3,4.0
1,6,4.0
1,47,5.0
1,50,5.0
1,70,3.0
1,101,5.0
1,110,4.0

以下是文件的外观：

import pandas as pd
ratings = pd.read_csv('ratings.csv') # userId,movieId,rating
user_rating_counts = ratings[['userId','movieId']].groupby('userId')['movieId'].agg(['count'])
top_rator = user_rating_counts[user_rating_counts['count']==user_rating_counts['count'].max()]

userId,movieId,rating
1,1,4.0
1,3,4.0
1,6,4.0
1,47,5.0
1,50,5.0
1,70,3.0
1,101,5.0
1,110,4.0

当我在jupyter笔记本中查看

top\u rator

时，它看起来是这样的：

       count
userId  
414     2698

我想从中得到一个元组，如：

(414, 2698)

我该怎么做

请注意，如果您对我如何做得更好/更快/更短有任何意见，我们将不胜感激。

您可以：

sizes = df.groupby(['userId']).size()
(sizes.idxmax(), sizes.max())
#(1, 8)

详情：

Groupby

userId

并获取每组的

大小
sizes = df.groupby(['userId']).size()
#userId
#1    8
#2    1

使用idxmax
和max
创建评分最高的用户的元组：
(sizes.idxmax(), sizes.max())
#(1, 8)

如果只有一个用户与max匹配，您只需使用：
next(top_rator.max(1).items())

解释
top\u rator.max（1）
将返回：
userId
1    8
dtype: int64

惰性地迭代序列，在zip
生成器对象中创建索引和值的tuple

用于访问此生成器中的“下一个”（第一个）tuple


如果有多个用户匹配最大值，请改用列表理解：
[(idx, val) for idx, val in top_rator.max(1).items()]

在列表中使用，然后使用max
和idxmax
：
tup = tuple(ratings.groupby('userId').size().agg(['idxmax','max']))
print (tup)
(1, 8)

说明：
每组的第一个聚合：
#changed data - multiple groups
print (df)
   userId  movieId  rating
0       1        1     4.0
1       1        3     4.0
2       1        6     4.0
3       2       47     5.0
4       2       50     5.0
5       2       70     3.0
6       2      101     5.0
7       3      110     4.0

print (df.groupby('userId').size())
userId
1    3
2    4
3    1
dtype: int64

输出是Series
，因此添加了函数列表idxmax
和max
，用于索引和序列值的最大值：
print (df.groupby('userId').size().agg(['idxmax','max']))
idxmax    2
max       4
dtype: int64

最后一次转换为元组
：
print (tuple(df.groupby('userId').size().agg(['idxmax','max'])))
(2, 4)


如果多个组具有相同的最大大小，则解决方案为：
print (ratings)   
   userId  movieId  rating
0       1        1     4.0
1       1        3     4.0
2       1        6     4.0
3       2       47     5.0
4       2       50     5.0
5       2       70     3.0
6       3      101     5.0
7       3      110     4.0

每组第一次聚合，但有两组的最大3
值：
user_rating_counts = ratings.groupby('userId')['movieId'].size()
print (user_rating_counts)
userId
1    3
2    3
3    2
Name: movieId, dtype: int64

因此，首先使用：
创建DataFrame
并转换为元组列表：
tup = list(map(tuple, top_rator.reset_index().values.tolist()))
print (tup)
[(1, 3), (2, 3)]

这很有趣，有没有简单的方法可以同时获得idxmax和max？@yukashimahuksay-当然，检查我的答案。你能给我解释一下这是如何在一次获得idxmax和max的吗？@yukashimahuksay-当然，增加了多个最大值的解决方案。