Python 从两个具有不常见列值的DFs中删除行
我有这两个 活动:Python 从两个具有不常见列值的DFs中删除行,python,pandas,numpy,Python,Pandas,Numpy,我有这两个 活动: Customer_ID | product_No| Rating 7 | 111 | 3.0 7 | 222 | 1.0 7 | 333 | 5.0 7 | 444 | 3.0 Customer_ID | product_No| Rating 7 | 111 | 3.0 7 | 222
Customer_ID | product_No| Rating
7 | 111 | 3.0
7 | 222 | 1.0
7 | 333 | 5.0
7 | 444 | 3.0
Customer_ID | product_No| Rating
7 | 111 | 3.0
7 | 222 | 1.0
用户:
我想找到两个用户对常见产品的评分(例如111222),并删除任何不常见的产品(例如444333555666)。因此,新的DFs应如下所示:
活动:
Customer_ID | product_No| Rating
7 | 111 | 3.0
7 | 222 | 1.0
7 | 333 | 5.0
7 | 444 | 3.0
Customer_ID | product_No| Rating
7 | 111 | 3.0
7 | 222 | 1.0
用户:
如果没有for循环,我不知道如何执行此操作。你能帮帮我吗
这是我目前掌握的代码:
import pandas as pd
ratings = pd.read_csv("ratings.csv",names['Customer_ID','product_No','Rating'])
active=ratings[ratings['UserID']==7]
user=ratings[ratings['UserID']==9]
您可以首先使用set intersection获取公共的
产品编号
,然后使用isin
方法对原始数据帧进行过滤:
common_product = set(active.product_No).intersection(user.product_No)
common_product
# {111, 222}
active[active.product_No.isin(common_product)]
#Customer_ID product_No Rating
#0 7 111 3.0
#1 7 222 1.0
user[user.product_No.isin(common_product)]
#Customer_ID product_No Rating
#0 9 111 2.0
#1 9 222 5.0
我使用
内部联接尝试了此操作,如下所示:
import pandas as pd
df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
print df1
print df2
df_ij = pd.merge(df1, df2, on='product_No', how='inner')
print df_ij
df_list = []
for df_e,suffx in zip([df1,df2],['_x','_y']):
df_e = df_ij[['Customer_ID'+suffx,'product_No','Rating'+suffx]]
df_e.columns = list(df1)
df_list.append(df_e)
print df_list[0]
print df_list[1]
它给出以下输出:
# print df1
Customer_ID product_No Rating
0 7 111 3
1 7 222 1
2 7 333 5
3 7 444 3
# print df2
Customer_ID product_No Rating
0 9 111 2
1 9 222 5
2 9 777 5
3 9 555 3
# print the INNER JOINed df
Customer_ID_x product_No Rating_x Customer_ID_y Rating_y
0 7 111 3 9 2
1 7 222 1 9 5
# print the first df you want, with common 'product_No'
Customer_ID product_No Rating
0 7 111 3
1 7 222 1
# print the second df you want, with common 'product_No'
Customer_ID product_No Rating
0 9 111 2
1 9 222 5
内部联接
选择每个df
中的公共行。由于存在通用列名,对于未在联接中使用的列,联接的df
添加了后缀以区分这些列名。然后,只需指定适当的后缀,即可提取列以获得所需的最终结果
这里有一个很好的例子,内部连接
使用查询
引用其他数据帧
Active.query('product_No in @User.product_No')
Customer_ID product_No Rating
0 7 111 3.0
1 7 222 1.0
User.query('product_No in @Active.product_No')
Customer_ID product_No Rating
0 9 111 2.0
1 9 222 5.0
你对这个问题的回答是
import pandas as pd
dict1={"Customer_id":[7,7,7,7],
"Product_No":[111,222,333,444],
"rating":[3.0,1.0,5.0,3.0]}
active=pd.DataFrame(dict1)
dict2={"Customer_id":[9,9,9,9],
"Product_No":[111,222,666,555],
"rating":[2.0,5.0,5.0,3.0]}
user=pd.DataFrame(dict2)
df3=pd.merge(active,user,on="Product_No",how="inner")
df3
active=df3[["Customer_id_x","Product_No","rating_x"]]
print(active)
user=df3[["Customer_id_y","Product_No","rating_y"]]
print(user)