Python 如何筛选阅读过4本以上书籍的用户？_Python_Pandas_Numpy

Python 如何筛选阅读过4本以上书籍的用户？

python pandas numpy

Python 如何筛选阅读过4本以上书籍的用户？,python,pandas,numpy,Python,Pandas,Numpy,我有一个用户图书评级的数据集，只想选择那些阅读了超过4本书的用户和已经阅读了超过4本书的用户我的数据如下所示： data.head() UserID Rating ISBN13 GoodreadsID Title Author 0 2111961 0 1592574289 1917 The Complete Idiot's Guide to Long Distance Re... Seetha Narayan 1 2111961

我有一个用户图书评级的数据集，只想选择那些阅读了超过4本书的用户和已经阅读了超过4本书的用户

我的数据如下所示：

data.head()

    UserID  Rating  ISBN13  GoodreadsID     Title   Author
0   2111961     0   1592574289  1917    The Complete Idiot's Guide to Long Distance Re...   Seetha Narayan
1   2111961     0   1580087140  1918    The Long-Distance Relationship Survival Guide:...   Chris Bell
2   2111961     0   0972114807  1919    Long Distance Relationships: The Complete Guide     Gregory Guldner
3   2111961     0   006091565X  1047974     The Dance of Anger: A Woman's Guide to Changin...   Harriet Lerner
4   2102951     0   006091565X  1047974     The Dance of Anger: A Woman's Guide to Changin...   Harriet Lerner

我试过：

data = data.groupby('UserID').filter(lambda x: len(x) >= 5)

但不确定这是否真的有效

我们将不胜感激。谢谢

new_df = pd.DataFrame()
for k,g in df.groupby('UserID'):
    if len(g)>=(4):
        new_df = pd.concat([new_df,g])
    else:
        pass

new\u df

是包含已阅读超过4本书的用户的数据框。

如果ISBN13是该书的ID，您可以尝试

mask1 = data.groupby('UserID')['UserID'].transform('count') > 4
mask2 = data.groupby('ISBN13')['ISBN13'].transform('count') > 4

data.loc[mask1 & mask2]

阅读过4本以上书籍的用户：

s = df.groupby('UserID')['ISBN13'].count()
u = s[s > 4].index
df[df['UserID'].isin(u)]

4位以上用户阅读过的书籍：

s = df.groupby('ISBN13')['UserID'].count()
b = s[s > 4].index
df[df['ISBN13'].isin(b)]

new_df=data.groupby（'UserID'）['ISBN13'].count（）；newdf[new_df.gt（4）]

？@QuangHoang选择拥有4本以上书籍的用户，但如何使用它来过滤原始数据集？

s=data.groupby（'UserID'）['ISBN13'].transform（'count'）；df[s]

。实际上，mas似乎不起作用。我做了df=data.loc[mask1&mask2]和df.droopby（'UserID'）['ISBN13'].count（）.descripe（），最小值为1。您能提供一个数据示例吗？然后我可以在我的电脑上测试试试这个，基本上应该是一样的：谢谢！一本书可能只有一个实例的阅读次数超过4次，因为它可能不符合用户过滤器的要求，因为阅读次数超过4本书。如果5个用户阅读了同一本书，但4个用户只阅读了1本书，那么这本书只有在读过它的最后一个用户阅读了4本以上的书时才会进入新DF