Python 合并两个数据帧,条件是第一个数据帧中的值是另一个数据帧中的值的子字符串

Python 合并两个数据帧,条件是第一个数据帧中的值是另一个数据帧中的值的子字符串,python,pandas,Python,Pandas,假设我有一个数据框retailer\u info,如下所示: price product_name url 0 5005 Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti... 1 7150 Intel Core i3-9100F 3.60 GHz Processor https://www.theitdepot.com/de

假设我有一个数据框
retailer\u info
,如下所示:

    price   product_name    url
0   5005    Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti...
1   7150    Intel Core i3-9100F 3.60 GHz Processor  https://www.theitdepot.com/details-Intel+Core+...
2   8210    AMD Ryzen 3 2200G with Radeon Vega 8 Graphics   https://www.theitdepot.com/details-AMD+Ryzen+3...
3   8415    AMD Ryzen 3 3200G with Radeon Vega 8 Graphics   https://www.theitdepot.com/details-AMD+Ryzen+3...
4   10330   AMD Ryzen 5 1600 3.2 GHz Processor  https://www.theitdepot.com/details-AMD+Ryzen+5...
    Type    Part Number Brand   Model   Rank
92  CPU YD1600BBAEBOX   AMD Ryzen 5 1600    93
96  CPU YD250XBBM4KAF   AMD Ryzen 5 2500X   97
108 CPU YD3200C5FHBOX   AMD Ryzen 3 3200G   109
129 CPU YD150XBBAEBOX   AMD Ryzen 5 1500X   130
138 CPU YD2400C5FBBOX   AMD Ryzen 5 2400G   139
139 CPU YD2200C5FBBOX   AMD Ryzen 3 2200G   140
153 CPU YD130XBBAEBOX   AMD Ryzen 3 1300X   154
我有另一个数据帧,
cpu\u info
如下:

    price   product_name    url
0   5005    Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti...
1   7150    Intel Core i3-9100F 3.60 GHz Processor  https://www.theitdepot.com/details-Intel+Core+...
2   8210    AMD Ryzen 3 2200G with Radeon Vega 8 Graphics   https://www.theitdepot.com/details-AMD+Ryzen+3...
3   8415    AMD Ryzen 3 3200G with Radeon Vega 8 Graphics   https://www.theitdepot.com/details-AMD+Ryzen+3...
4   10330   AMD Ryzen 5 1600 3.2 GHz Processor  https://www.theitdepot.com/details-AMD+Ryzen+5...
    Type    Part Number Brand   Model   Rank
92  CPU YD1600BBAEBOX   AMD Ryzen 5 1600    93
96  CPU YD250XBBM4KAF   AMD Ryzen 5 2500X   97
108 CPU YD3200C5FHBOX   AMD Ryzen 3 3200G   109
129 CPU YD150XBBAEBOX   AMD Ryzen 5 1500X   130
138 CPU YD2400C5FBBOX   AMD Ryzen 5 2400G   139
139 CPU YD2200C5FBBOX   AMD Ryzen 3 2200G   140
153 CPU YD130XBBAEBOX   AMD Ryzen 3 1300X   154
现在对于系列
cpu\u info['Model']
中的每个值,我需要检查它是否是系列
retailer\u info['product\u name']
中任何值的子字符串,如果是,我想将df
retailer\u info
中的列
url
合并到数据框
cpu\u info

预期成果:

    Type    Part Number Brand   Model   Rank    url
92  CPU YD1600BBAEBOX   AMD Ryzen 5 1600    93  https://www.theitdepot.com/details-AMD+Ryzen+5...
96  CPU YD250XBBM4KAF   AMD Ryzen 5 2500X   97  NaN
108 CPU YD3200C5FHBOX   AMD Ryzen 3 3200G   109 https://www.theitdepot.com/details-AMD+Ryzen+3...
129 CPU YD150XBBAEBOX   AMD Ryzen 5 1500X   130 NaN
138 CPU YD2400C5FBBOX   AMD Ryzen 5 2400G   139 NaN
139 CPU YD2200C5FBBOX   AMD Ryzen 3 2200G   140 https://www.theitdepot.com/details-AMD+Ryzen+3...
153 CPU YD130XBBAEBOX   AMD Ryzen 3 1300X   154 NaN
我意识到
new\u df=pd.merge(cpu,它['product\u name','url'],on='',how='left')

仅当您希望仅基于列值进行合并时有效。我不确定如何达到我想要的结果。我真的很感激任何帮助。Thanls.

试试这个。应该行得通

def find_url(model_name):
    try:
        return retailer_info[retailer_info['product_name'].str.contains(model_name)]['address'].values[0]
    except:
        return None

cpu_info['url'] = cpu_info['Model'].apply(model_name)

可以添加多个条件:

dicc = pd.Series(retailer_info["url"].values,index=retailer_info["product_name"]).to_dict()

cpu_info["url"] = ""
for index, row in cpu_info.iterrows():
    for key in dicc:
        if row["Brand"] in key and row["Model"] in key:
            cpu_info.at[index, "url"] = dicc[key]
            break

谢谢它是有效的,但我不完全理解这个表达。我得到
retailer\u info[retailer\u info.product\u name.str.find(x.ge(0)]['url']
返回一个系列。因此表达式
cpu\u info.Model.apply(lambda x:retailer\u info[retailer\u info.product\u name.str.find(x.ge(0)]['url'])
cpu\u info.Model
中的每个值返回一个序列?但是我得到了您使用
.bfill(axis=1).iloc[:,0]
所做的操作,您将获得从不同列到第一列的所有链接,并返回该系列。但是我不理解链接分布在不同列中的部分。这是因为,每次lambda函数都返回一系列shape
(n,1)
,而不仅仅是所需的URL。我已经更新了代码并简化了它。抱歉刚才的错误代码。