Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:从列表中选择包含任何子字符串的行_Python_Python 3.x_Pandas_Dataframe_Substring - Fatal编程技术网

Python 熊猫:从列表中选择包含任何子字符串的行

Python 熊猫:从列表中选择包含任何子字符串的行,python,python-3.x,pandas,dataframe,substring,Python,Python 3.x,Pandas,Dataframe,Substring,我想在包含列表中任何子字符串的列中选择这些行。这就是我现在所拥有的 product = ['LID', 'TABLEWARE', 'CUP', 'COVER', 'CONTAINER', 'PACKAGING'] df_plastic_prod = df_plastic[df_plastic['Goods Shipped'].str.contains(product)] df_plastic_prod.info() 塑料样品 Name Product David

我想在包含列表中任何子字符串的列中选择这些行。这就是我现在所拥有的

product = ['LID', 'TABLEWARE', 'CUP', 'COVER', 'CONTAINER', 'PACKAGING']

df_plastic_prod = df_plastic[df_plastic['Goods Shipped'].str.contains(product)]

df_plastic_prod.info()
塑料样品

Name          Product
David        PLASTIC BOTTLE
Meghan       PLASTIC COVER
Melanie      PLASTIC CUP 
Aaron        PLASTIC BOWL
Venus        PLASTIC KNIFE
Abigail      PLASTIC CONTAINER
Sophia       PLASTIC LID
所需df_塑料_产品

Name          Product
Meghan       PLASTIC COVER
Melanie      PLASTIC CUP 
Abigail      PLASTIC CONTAINER
Sophia       PLASTIC LID

提前谢谢!我感谢您在这方面的帮助

一种解决方案是使用regex解析
的“Product”
列,并测试提取的值是否在
的Product
列表中,然后根据结果过滤原始数据帧

在本例中,使用了一个非常简单的正则表达式模式(
(\w+)$
),它匹配行尾的一个单词

示例代码:

df.iloc[df['Product'].str.extract('(\w+)$').isin(product).to_numpy(), :]
输出:

      Name            Product
1   Meghan      PLASTIC COVER
2  Melanie        PLASTIC CUP
5  Abigail  PLASTIC CONTAINER
6   Sophia        PLASTIC LID
设置:

product = ['LID', 'TABLEWARE', 'CUP', 
           'COVER', 'CONTAINER', 'PACKAGING']

data = {'Name': ['David', 'Meghan', 'Melanie', 
                 'Aaron', 'Venus', 'Abigail', 'Sophia'],
        'Product': ['PLASTIC BOTTLE', 'PLASTIC COVER', 'PLASTIC CUP', 
                    'PLASTIC BOWL', 'PLASTIC KNIFE', 'PLASTIC CONTAINER',
                    'PLASTIC LID']}
    
df = pd.DataFrame(data)

对于通过子字符串匹配的值,将regex
的列表中的所有值通过
|
进行连接-因此获取值
LID
餐具

解决方案也适用于
列表中的2个或更多单词

pat = '|'.join(r"\b{}\b".format(x) for x in product)
df_plastic_prod = df_plastic[df_plastic['Product'].str.contains(pat)]
print (df_plastic_prod)
      Name            Product
1   Meghan      PLASTIC COVER
2  Melanie        PLASTIC CUP
5  Abigail  PLASTIC CONTAINER
6   Sophia        PLASTIC LID

问题:为什么我们需要在
pat
中有
\b
?@balderman-它被称为单词边界,所有单词都需要匹配-避免匹配
cat
在words
中山猫很好
,机器
猫很好
我明白了。因此,
pat
实际上是一个regexp,术语“单词边界”取自regexp域。谢谢