Python 使用dataframe中的关键字检测另一个dataframe或字符串中是否存在任何内容_Python_Pandas_Dataframe_Filtering

Python 使用dataframe中的关键字检测另一个dataframe或字符串中是否存在任何内容

python pandas dataframe

Python 使用dataframe中的关键字检测另一个dataframe或字符串中是否存在任何内容,python,pandas,dataframe,filtering,Python,Pandas,Dataframe,Filtering,我有两个问题：第一是我有一个数据框，其中包含以下类别和关键字： Category Keywords 0 Fruit ['apple', 'pear', 'plum', 'grape'] 1 Color ['red', 'purple', 'green'] Summary 0 This is a basket of red apples. They are

我有两个问题：第一是

我有一个数据框，其中包含以下类别和关键字：

  Category                   Keywords
0    Fruit            ['apple', 'pear', 'plum', 'grape']
1    Color            ['red', 'purple', 'green']

              Summary
0        This is a basket of red apples. They are sour.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.

      Category                                            Summary
0    Fruit, Color     This is a basket of red apples. They are sour.
1           Color     We found a bushel of fruit. They are red.
2    Fruit, Color     There is a peck of green pears that taste sweet.
3           Fruit     We have a box of plums.

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

另一个数据帧如下所示：

  Category                   Keywords
0    Fruit            ['apple', 'pear', 'plum', 'grape']
1    Color            ['red', 'purple', 'green']

              Summary
0        This is a basket of red apples. They are sour.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.

      Category                                            Summary
0    Fruit, Color     This is a basket of red apples. They are sour.
1           Color     We found a bushel of fruit. They are red.
2    Fruit, Color     There is a peck of green pears that taste sweet.
3           Fruit     We have a box of plums.

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

我希望最终结果如下：

  Category                   Keywords
0    Fruit            ['apple', 'pear', 'plum', 'grape']
1    Color            ['red', 'purple', 'green']

              Summary
0        This is a basket of red apples. They are sour.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.

      Category                                            Summary
0    Fruit, Color     This is a basket of red apples. They are sour.
1           Color     We found a bushel of fruit. They are red.
2    Fruit, Color     There is a peck of green pears that taste sweet.
3           Fruit     We have a box of plums.

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

第二是

我应该能够检查字符串是否包含任何关键字，如果为true，则输出相应类别的列表

示例：

sample\u句子=“此行包含一个红李子？”

输出：

result_list = ['color','Fruit']

编辑：类似但不相同。请参考：

编辑2：

我还有另一个版本的first dataframe，如下所示：

  Category                   Keywords
0    Fruit            ['apple', 'pear', 'plum', 'grape']
1    Color            ['red', 'purple', 'green']

              Summary
0        This is a basket of red apples. They are sour.
1        We found a bushel of fruit. They are red.
2        There is a peck of pears that taste sweet.
3        We have a box of plums.

      Category                                            Summary
0    Fruit, Color     This is a basket of red apples. They are sour.
1           Color     We found a bushel of fruit. They are red.
2    Fruit, Color     There is a peck of green pears that taste sweet.
3           Fruit     We have a box of plums.

  Category                   Filters
0    Fruit  apple, pear, plum, grape
1    Color        red, purple, green

您可以使用列表理解来实现这一点：

数据帧设置：

df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
 'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
  1: 'We found a bushel of fruit. They are red.',
  2: 'There is a peck of pears that taste sweet.',
  3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')

df2['Category'] = (df2['Summary'].str.split(' ').apply(
    lambda x: list(set([str(a) for y in 
                        x for a,b in 
                        zip(df1['Category'], df1['Keywords']) for c in 
                        b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
                        str(y)]))).str.join(', '))
df2

Out[1]: 
                                          Summary      Category
0  This is a basket of red apples. They are sour.  Fruit, Color
1       We found a bushel of fruit. They are red.         Color
2      There is a peck of pears that taste sweet.         Fruit
3                         We have a box of plums.         Fruit

代码：

df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
 'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
  1: 'We found a bushel of fruit. They are red.',
  2: 'There is a peck of pears that taste sweet.',
  3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')

df2['Category'] = (df2['Summary'].str.split(' ').apply(
    lambda x: list(set([str(a) for y in 
                        x for a,b in 
                        zip(df1['Category'], df1['Keywords']) for c in 
                        b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
                        str(y)]))).str.join(', '))
df2

Out[1]: 
                                          Summary      Category
0  This is a basket of red apples. They are sour.  Fruit, Color
1       We found a bushel of fruit. They are red.         Color
2      There is a peck of pears that taste sweet.         Fruit
3                         We have a box of plums.         Fruit

输出：

df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
 'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
  1: 'We found a bushel of fruit. They are red.',
  2: 'There is a peck of pears that taste sweet.',
  3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')

df2['Category'] = (df2['Summary'].str.split(' ').apply(
    lambda x: list(set([str(a) for y in 
                        x for a,b in 
                        zip(df1['Category'], df1['Keywords']) for c in 
                        b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
                        str(y)]))).str.join(', '))
df2

Out[1]: 
                                          Summary      Category
0  This is a basket of red apples. They are sour.  Fruit, Color
1       We found a bushel of fruit. They are red.         Color
2      There is a peck of pears that taste sweet.         Fruit
3                         We have a box of plums.         Fruit

、

和

在

行中迭代（垂直）c
和y
迭代行内的列表（水平）。为了开始水平遍历列表，首先需要垂直遍历行。这就是我们拥有所有这些变量的原因（见图）。您可以使用zip
同时迭代第一个数据帧的两列或多列
感谢David Erickson的快速回复。我得到了以下错误：“'float'对象不可编辑”我很确定我的数据中没有任何浮点值。我已经用另一个版本的first dataframe更新了我的问题，以使事情更简单。@Brussel我在所有数据前面都加了str。我猜你的句子栏里有一个整数。错误就在我这边。我纠正了浮动问题。非常感谢你的解决方案，大卫。你让我开心。如果你能为你的解决方案提供更多的解释，那么任何有类似问题的人都可以根据他们的问题来定制你的解决方案，这将非常有帮助。我还看到你的解决方案匹配子字符串，一个完整的单词匹配会很棒。非常感谢你的努力@David。你用图片解释清楚了一切。