Python 合并两个数据帧，条件是第一个数据帧中的值是另一个数据帧中的值的子字符串_Python_Pandas

Python 合并两个数据帧，条件是第一个数据帧中的值是另一个数据帧中的值的子字符串

python pandas

Python 合并两个数据帧，条件是第一个数据帧中的值是另一个数据帧中的值的子字符串,python,pandas,Python,Pandas,假设我有一个数据框retailer\u info，如下所示： price product_name url 0 5005 Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti... 1 7150 Intel Core i3-9100F 3.60 GHz Processor https://www.theitdepot.com/de

假设我有一个数据框

retailer\u info

，如下所示：

    price   product_name    url
0   5005    Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti...
1   7150    Intel Core i3-9100F 3.60 GHz Processor  https://www.theitdepot.com/details-Intel+Core+...
2   8210    AMD Ryzen 3 2200G with Radeon Vega 8 Graphics   https://www.theitdepot.com/details-AMD+Ryzen+3...
3   8415    AMD Ryzen 3 3200G with Radeon Vega 8 Graphics   https://www.theitdepot.com/details-AMD+Ryzen+3...
4   10330   AMD Ryzen 5 1600 3.2 GHz Processor  https://www.theitdepot.com/details-AMD+Ryzen+5...

    Type    Part Number Brand   Model   Rank
92  CPU YD1600BBAEBOX   AMD Ryzen 5 1600    93
96  CPU YD250XBBM4KAF   AMD Ryzen 5 2500X   97
108 CPU YD3200C5FHBOX   AMD Ryzen 3 3200G   109
129 CPU YD150XBBAEBOX   AMD Ryzen 5 1500X   130
138 CPU YD2400C5FBBOX   AMD Ryzen 5 2400G   139
139 CPU YD2200C5FBBOX   AMD Ryzen 3 2200G   140
153 CPU YD130XBBAEBOX   AMD Ryzen 3 1300X   154

我有另一个数据帧，

cpu\u info

如下：

    price   product_name    url
0   5005    Intel Pentium Gold G5400 3.70 GHz Processor https://www.theitdepot.com/details-Intel+Penti...
1   7150    Intel Core i3-9100F 3.60 GHz Processor  https://www.theitdepot.com/details-Intel+Core+...
2   8210    AMD Ryzen 3 2200G with Radeon Vega 8 Graphics   https://www.theitdepot.com/details-AMD+Ryzen+3...
3   8415    AMD Ryzen 3 3200G with Radeon Vega 8 Graphics   https://www.theitdepot.com/details-AMD+Ryzen+3...
4   10330   AMD Ryzen 5 1600 3.2 GHz Processor  https://www.theitdepot.com/details-AMD+Ryzen+5...

    Type    Part Number Brand   Model   Rank
92  CPU YD1600BBAEBOX   AMD Ryzen 5 1600    93
96  CPU YD250XBBM4KAF   AMD Ryzen 5 2500X   97
108 CPU YD3200C5FHBOX   AMD Ryzen 3 3200G   109
129 CPU YD150XBBAEBOX   AMD Ryzen 5 1500X   130
138 CPU YD2400C5FBBOX   AMD Ryzen 5 2400G   139
139 CPU YD2200C5FBBOX   AMD Ryzen 3 2200G   140
153 CPU YD130XBBAEBOX   AMD Ryzen 3 1300X   154

现在对于系列

cpu\u info['Model']

中的每个值，我需要检查它是否是系列

retailer\u info['product\u name']

中任何值的子字符串，如果是，我想将df

retailer\u info

中的列

url

合并到数据框

cpu\u info

预期成果：

    Type    Part Number Brand   Model   Rank    url
92  CPU YD1600BBAEBOX   AMD Ryzen 5 1600    93  https://www.theitdepot.com/details-AMD+Ryzen+5...
96  CPU YD250XBBM4KAF   AMD Ryzen 5 2500X   97  NaN
108 CPU YD3200C5FHBOX   AMD Ryzen 3 3200G   109 https://www.theitdepot.com/details-AMD+Ryzen+3...
129 CPU YD150XBBAEBOX   AMD Ryzen 5 1500X   130 NaN
138 CPU YD2400C5FBBOX   AMD Ryzen 5 2400G   139 NaN
139 CPU YD2200C5FBBOX   AMD Ryzen 3 2200G   140 https://www.theitdepot.com/details-AMD+Ryzen+3...
153 CPU YD130XBBAEBOX   AMD Ryzen 3 1300X   154 NaN

我意识到

new\u df=pd.merge（cpu，它['product\u name'，'url']，on=''，how='left'）

仅当您希望仅基于列值进行合并时有效。我不确定如何达到我想要的结果。我真的很感激任何帮助。Thanls.

试试这个。应该行得通

def find_url(model_name):
    try:
        return retailer_info[retailer_info['product_name'].str.contains(model_name)]['address'].values[0]
    except:
        return None

cpu_info['url'] = cpu_info['Model'].apply(model_name)

可以添加多个条件：

dicc = pd.Series(retailer_info["url"].values,index=retailer_info["product_name"]).to_dict()

cpu_info["url"] = ""
for index, row in cpu_info.iterrows():
    for key in dicc:
        if row["Brand"] in key and row["Model"] in key:
            cpu_info.at[index, "url"] = dicc[key]
            break

谢谢它是有效的，但我不完全理解这个表达。我得到

retailer\u info[retailer\u info.product\u name.str.find（x.ge（0）]['url']

返回一个系列。因此表达式

cpu\u info.Model.apply（lambda x:retailer\u info[retailer\u info.product\u name.str.find（x.ge（0）]['url']）

为

cpu\u info.Model

中的每个值返回一个序列？但是我得到了您使用

.bfill（axis=1）.iloc[：，0]

所做的操作，您将获得从不同列到第一列的所有链接，并返回该系列。但是我不理解链接分布在不同列中的部分。这是因为，每次lambda函数都返回一系列shape

（n，1）

，而不仅仅是所需的URL。我已经更新了代码并简化了它。抱歉刚才的错误代码。