Python 合并两个数据帧,条件是第一个数据帧中的值是另一个数据帧中的值的子字符串
假设我有一个数据框Python 合并两个数据帧,条件是第一个数据帧中的值是另一个数据帧中的值的子字符串,python,pandas,Python,Pandas,假设我有一个数据框retailer\u info,如下所示: price product_name url 0 5005 Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti... 1 7150 Intel Core i3-9100F 3.60 GHz Processor https://www.theitdepot.com/de
retailer\u info
,如下所示:
price product_name url
0 5005 Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti...
1 7150 Intel Core i3-9100F 3.60 GHz Processor https://www.theitdepot.com/details-Intel+Core+...
2 8210 AMD Ryzen 3 2200G with Radeon Vega 8 Graphics https://www.theitdepot.com/details-AMD+Ryzen+3...
3 8415 AMD Ryzen 3 3200G with Radeon Vega 8 Graphics https://www.theitdepot.com/details-AMD+Ryzen+3...
4 10330 AMD Ryzen 5 1600 3.2 GHz Processor https://www.theitdepot.com/details-AMD+Ryzen+5...
Type Part Number Brand Model Rank
92 CPU YD1600BBAEBOX AMD Ryzen 5 1600 93
96 CPU YD250XBBM4KAF AMD Ryzen 5 2500X 97
108 CPU YD3200C5FHBOX AMD Ryzen 3 3200G 109
129 CPU YD150XBBAEBOX AMD Ryzen 5 1500X 130
138 CPU YD2400C5FBBOX AMD Ryzen 5 2400G 139
139 CPU YD2200C5FBBOX AMD Ryzen 3 2200G 140
153 CPU YD130XBBAEBOX AMD Ryzen 3 1300X 154
我有另一个数据帧,cpu\u info
如下:
price product_name url
0 5005 Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti...
1 7150 Intel Core i3-9100F 3.60 GHz Processor https://www.theitdepot.com/details-Intel+Core+...
2 8210 AMD Ryzen 3 2200G with Radeon Vega 8 Graphics https://www.theitdepot.com/details-AMD+Ryzen+3...
3 8415 AMD Ryzen 3 3200G with Radeon Vega 8 Graphics https://www.theitdepot.com/details-AMD+Ryzen+3...
4 10330 AMD Ryzen 5 1600 3.2 GHz Processor https://www.theitdepot.com/details-AMD+Ryzen+5...
Type Part Number Brand Model Rank
92 CPU YD1600BBAEBOX AMD Ryzen 5 1600 93
96 CPU YD250XBBM4KAF AMD Ryzen 5 2500X 97
108 CPU YD3200C5FHBOX AMD Ryzen 3 3200G 109
129 CPU YD150XBBAEBOX AMD Ryzen 5 1500X 130
138 CPU YD2400C5FBBOX AMD Ryzen 5 2400G 139
139 CPU YD2200C5FBBOX AMD Ryzen 3 2200G 140
153 CPU YD130XBBAEBOX AMD Ryzen 3 1300X 154
现在对于系列cpu\u info['Model']
中的每个值,我需要检查它是否是系列retailer\u info['product\u name']
中任何值的子字符串,如果是,我想将dfretailer\u info
中的列url
合并到数据框cpu\u info
预期成果:
Type Part Number Brand Model Rank url
92 CPU YD1600BBAEBOX AMD Ryzen 5 1600 93 https://www.theitdepot.com/details-AMD+Ryzen+5...
96 CPU YD250XBBM4KAF AMD Ryzen 5 2500X 97 NaN
108 CPU YD3200C5FHBOX AMD Ryzen 3 3200G 109 https://www.theitdepot.com/details-AMD+Ryzen+3...
129 CPU YD150XBBAEBOX AMD Ryzen 5 1500X 130 NaN
138 CPU YD2400C5FBBOX AMD Ryzen 5 2400G 139 NaN
139 CPU YD2200C5FBBOX AMD Ryzen 3 2200G 140 https://www.theitdepot.com/details-AMD+Ryzen+3...
153 CPU YD130XBBAEBOX AMD Ryzen 3 1300X 154 NaN
我意识到new\u df=pd.merge(cpu,它['product\u name','url'],on='',how='left')
仅当您希望仅基于列值进行合并时有效。我不确定如何达到我想要的结果。我真的很感激任何帮助。Thanls.试试这个。应该行得通
def find_url(model_name):
try:
return retailer_info[retailer_info['product_name'].str.contains(model_name)]['address'].values[0]
except:
return None
cpu_info['url'] = cpu_info['Model'].apply(model_name)
可以添加多个条件:
dicc = pd.Series(retailer_info["url"].values,index=retailer_info["product_name"]).to_dict()
cpu_info["url"] = ""
for index, row in cpu_info.iterrows():
for key in dicc:
if row["Brand"] in key and row["Model"] in key:
cpu_info.at[index, "url"] = dicc[key]
break
谢谢它是有效的,但我不完全理解这个表达。我得到
retailer\u info[retailer\u info.product\u name.str.find(x.ge(0)]['url']
返回一个系列。因此表达式cpu\u info.Model.apply(lambda x:retailer\u info[retailer\u info.product\u name.str.find(x.ge(0)]['url'])
为cpu\u info.Model
中的每个值返回一个序列?但是我得到了您使用.bfill(axis=1).iloc[:,0]
所做的操作,您将获得从不同列到第一列的所有链接,并返回该系列。但是我不理解链接分布在不同列中的部分。这是因为,每次lambda函数都返回一系列shape(n,1)
,而不仅仅是所需的URL。我已经更新了代码并简化了它。抱歉刚才的错误代码。