Python 从两个具有不常见列值的DFs中删除行

Python 从两个具有不常见列值的DFs中删除行,python,pandas,numpy,Python,Pandas,Numpy,我有这两个 活动: Customer_ID | product_No| Rating 7 | 111 | 3.0 7 | 222 | 1.0 7 | 333 | 5.0 7 | 444 | 3.0 Customer_ID | product_No| Rating 7 | 111 | 3.0 7 | 222

我有这两个

活动:

Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0
7           | 333       | 5.0
7           | 444       | 3.0
Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0
用户:

我想找到两个用户对常见产品的评分(例如111222),并删除任何不常见的产品(例如444333555666)。因此,新的DFs应如下所示:

活动:

Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0
7           | 333       | 5.0
7           | 444       | 3.0
Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0
用户:

如果没有for循环,我不知道如何执行此操作。你能帮帮我吗

这是我目前掌握的代码:

import pandas as pd
ratings = pd.read_csv("ratings.csv",names['Customer_ID','product_No','Rating'])
active=ratings[ratings['UserID']==7]
user=ratings[ratings['UserID']==9]

您可以首先使用set intersection获取公共的
产品编号
,然后使用
isin
方法对原始数据帧进行过滤:

common_product = set(active.product_No).intersection(user.product_No)

common_product
# {111, 222}

active[active.product_No.isin(common_product)]

#Customer_ID   product_No   Rating
#0         7          111      3.0
#1         7          222      1.0

user[user.product_No.isin(common_product)]

#Customer_ID   product_No   Rating
#0         9          111      2.0
#1         9          222      5.0

我使用
内部联接尝试了此操作,如下所示:

import pandas as pd

df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
print df1
print df2

df_ij = pd.merge(df1, df2, on='product_No', how='inner')
print df_ij

df_list = []
for df_e,suffx in zip([df1,df2],['_x','_y']):
    df_e = df_ij[['Customer_ID'+suffx,'product_No','Rating'+suffx]]
    df_e.columns = list(df1)
    df_list.append(df_e)

print df_list[0]
print df_list[1]
它给出以下输出:

# print df1
   Customer_ID  product_No  Rating
0            7         111       3
1            7         222       1
2            7         333       5
3            7         444       3

# print df2
   Customer_ID  product_No  Rating
0            9         111       2
1            9         222       5
2            9         777       5
3            9         555       3

# print the INNER JOINed df
   Customer_ID_x  product_No  Rating_x  Customer_ID_y  Rating_y
0              7         111         3              9         2
1              7         222         1              9         5

# print the first df you want, with common 'product_No'
   Customer_ID  product_No  Rating
0            7         111       3
1            7         222       1

# print the second df you want, with common 'product_No'
   Customer_ID  product_No  Rating
0            9         111       2
1            9         222       5
内部联接
选择每个
df
中的公共行。由于存在通用列名,对于未在联接中使用的列,联接的
df
添加了后缀以区分这些列名。然后,只需指定适当的后缀,即可提取列以获得所需的最终结果


这里有一个很好的例子,
内部连接

使用
查询
引用其他数据帧

Active.query('product_No in @User.product_No')

   Customer_ID  product_No  Rating
0            7         111     3.0
1            7         222     1.0

User.query('product_No in @Active.product_No')

   Customer_ID  product_No  Rating
0            9         111     2.0
1            9         222     5.0

你对这个问题的回答是

import pandas as pd
dict1={"Customer_id":[7,7,7,7],
      "Product_No":[111,222,333,444],
      "rating":[3.0,1.0,5.0,3.0]}
active=pd.DataFrame(dict1)
dict2={"Customer_id":[9,9,9,9],
      "Product_No":[111,222,666,555],
      "rating":[2.0,5.0,5.0,3.0]}
user=pd.DataFrame(dict2)
df3=pd.merge(active,user,on="Product_No",how="inner")
df3
active=df3[["Customer_id_x","Product_No","rating_x"]]
print(active)
user=df3[["Customer_id_y","Product_No","rating_y"]]
print(user)