Python 如何在字符串中搜索关键字、提取该字符串并将其放置在新列中？_Python_Pandas

Python 如何在字符串中搜索关键字、提取该字符串并将其放置在新列中？

python pandas

Python 如何在字符串中搜索关键字、提取该字符串并将其放置在新列中？,python,pandas,Python,Pandas,我在用熊猫。这是我的df： df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']} 我想搜索每个字符串值，只提取产品类别，然后将提取的字符串值放入另一列（“类别”）。您可能会注意到，产品名称没有正式的命名约定，因此使用.split（）并不理想最终结果应如下所示： df = {'Product Name': ['Nike Zoom Pegas

我在用熊猫。这是我的df：

df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']}

我想搜索每个字符串值，只提取产品类别，然后将提取的字符串值放入另一列（“类别”）。您可能会注意到，产品名称没有正式的命名约定，因此使用.split（）并不理想

最终结果应如下所示：

df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5'], 'Category': ['Pegasus', 'Pegasus', 'Metcon', 'Metcon]}

我当前的代码是这样的，但我得到一个错误：

def get_category(product):
if df['Product Name'].str.contains('Pegasus') or df['Product Name'].str.contains('Metcon'):
    return product

df['Category'] = df['Product Name'].apply(lambda x: get_category(x))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

希望你能帮忙。谢谢

您的代码存在以下问题：

您正在传递产品，但在检查时使用的是返回整个系列的
```
df[“product Name”]
```
此外，返回值是product。但根据预期的答案，可能是
```
Pegasus
```
或
```
Metcon
```

我想你想要这样的东西

def get_category(product):
    if "Pegasus" in product:
        return "Pegasus" 
    elif "Metcon" in product:
        return "Metcon"

这个解决方案怎么样，当你有一个新的类别时，你所要做的就是将新的类别添加到cats数组中

import pandas as pd
import numpy as np

df = pd.DataFrame({'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']})
cats = ["Pegasus","Metcon"]
df["Category"] = df["Product Name"].apply(lambda x: np.intersect1d(x.split(" "),cats)[0])


output
                  Product Name Category
0            Nike Zoom Pegasus  Pegasus
1  All New Nike Zoom Pegasus 4  Pegasus
2                     Metcon 3   Metcon
3                Nike Metcon 5   Metcon

那么：

将熊猫作为pd导入
df={'Product Name'：['Nike Zoom Pegasus'，'全新Nike Zoom Pegasus 4'，'Metcon 3'，'Nike Metcon 5']}
c=集合（['Metcon'，'Pegasus']）
类别=[df['Product Name']中pn的c.交叉点（pn.分割（''）]
df['Categories']=类别
打印（df）

使用

您需要提供识别类别的逻辑/规则。您会遇到什么错误？什么是

sku

？@pecey:ValueError:序列的真值不明确。使用a.empty、a.bool（）、a.item（）、a.any（）或a.all（）！

>> {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5'], 'Categories': [{'Pegasus'}, {'Pegasus'}, {'Metcon'}, {'Metcon'}]}

>>> df = pd.DataFrame({'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']})
>>> cats = ["Pegasus","Metcon"]

>>> df['Category'] = df["Product Name"].str.extract("(%s)" % "|".join(cats))

                  Product Name Category
0            Nike Zoom Pegasus  Pegasus
1  All New Nike Zoom Pegasus 4  Pegasus
2                     Metcon 3   Metcon
3                Nike Metcon 5   Metcon