Python 使用dataframe中的关键字检测另一个dataframe或字符串中是否存在任何内容
我有两个问题:第一是 我有一个数据框,其中包含以下类别和关键字:Python 使用dataframe中的关键字检测另一个dataframe或字符串中是否存在任何内容,python,pandas,dataframe,filtering,Python,Pandas,Dataframe,Filtering,我有两个问题:第一是 我有一个数据框,其中包含以下类别和关键字: Category Keywords 0 Fruit ['apple', 'pear', 'plum', 'grape'] 1 Color ['red', 'purple', 'green'] Summary 0 This is a basket of red apples. They are
Category Keywords
0 Fruit ['apple', 'pear', 'plum', 'grape']
1 Color ['red', 'purple', 'green']
Summary
0 This is a basket of red apples. They are sour.
1 We found a bushel of fruit. They are red.
2 There is a peck of pears that taste sweet.
3 We have a box of plums.
Category Summary
0 Fruit, Color This is a basket of red apples. They are sour.
1 Color We found a bushel of fruit. They are red.
2 Fruit, Color There is a peck of green pears that taste sweet.
3 Fruit We have a box of plums.
Category Filters
0 Fruit apple, pear, plum, grape
1 Color red, purple, green
另一个数据帧如下所示:
Category Keywords
0 Fruit ['apple', 'pear', 'plum', 'grape']
1 Color ['red', 'purple', 'green']
Summary
0 This is a basket of red apples. They are sour.
1 We found a bushel of fruit. They are red.
2 There is a peck of pears that taste sweet.
3 We have a box of plums.
Category Summary
0 Fruit, Color This is a basket of red apples. They are sour.
1 Color We found a bushel of fruit. They are red.
2 Fruit, Color There is a peck of green pears that taste sweet.
3 Fruit We have a box of plums.
Category Filters
0 Fruit apple, pear, plum, grape
1 Color red, purple, green
我希望最终结果如下:
Category Keywords
0 Fruit ['apple', 'pear', 'plum', 'grape']
1 Color ['red', 'purple', 'green']
Summary
0 This is a basket of red apples. They are sour.
1 We found a bushel of fruit. They are red.
2 There is a peck of pears that taste sweet.
3 We have a box of plums.
Category Summary
0 Fruit, Color This is a basket of red apples. They are sour.
1 Color We found a bushel of fruit. They are red.
2 Fruit, Color There is a peck of green pears that taste sweet.
3 Fruit We have a box of plums.
Category Filters
0 Fruit apple, pear, plum, grape
1 Color red, purple, green
第二是
我应该能够检查字符串是否包含任何关键字,如果为true,则输出相应类别的列表
示例:sample\u句子=“此行包含一个红李子?”
输出:
result_list = ['color','Fruit']
编辑:类似但不相同。请参考:
编辑2:
我还有另一个版本的first dataframe,如下所示:
Category Keywords
0 Fruit ['apple', 'pear', 'plum', 'grape']
1 Color ['red', 'purple', 'green']
Summary
0 This is a basket of red apples. They are sour.
1 We found a bushel of fruit. They are red.
2 There is a peck of pears that taste sweet.
3 We have a box of plums.
Category Summary
0 Fruit, Color This is a basket of red apples. They are sour.
1 Color We found a bushel of fruit. They are red.
2 Fruit, Color There is a peck of green pears that taste sweet.
3 Fruit We have a box of plums.
Category Filters
0 Fruit apple, pear, plum, grape
1 Color red, purple, green
您可以使用列表理解来实现这一点: 数据帧设置:
df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
1: 'We found a bushel of fruit. They are red.',
2: 'There is a peck of pears that taste sweet.',
3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')
df2['Category'] = (df2['Summary'].str.split(' ').apply(
lambda x: list(set([str(a) for y in
x for a,b in
zip(df1['Category'], df1['Keywords']) for c in
b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
str(y)]))).str.join(', '))
df2
Out[1]:
Summary Category
0 This is a basket of red apples. They are sour. Fruit, Color
1 We found a bushel of fruit. They are red. Color
2 There is a peck of pears that taste sweet. Fruit
3 We have a box of plums. Fruit
代码:
df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
1: 'We found a bushel of fruit. They are red.',
2: 'There is a peck of pears that taste sweet.',
3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')
df2['Category'] = (df2['Summary'].str.split(' ').apply(
lambda x: list(set([str(a) for y in
x for a,b in
zip(df1['Category'], df1['Keywords']) for c in
b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
str(y)]))).str.join(', '))
df2
Out[1]:
Summary Category
0 This is a basket of red apples. They are sour. Fruit, Color
1 We found a bushel of fruit. They are red. Color
2 There is a peck of pears that taste sweet. Fruit
3 We have a box of plums. Fruit
输出:
df1 = pd.DataFrame({'Category': {0: 'Fruit', 1: 'Color'},
'Keywords': {0: 'apple,pear,plum,grape', 1: 'red,purple,green'}})
df2 = pd.DataFrame({'Summary': {0: 'This is a basket of red apples. They are sour.',
1: 'We found a bushel of fruit. They are red.',
2: 'There is a peck of pears that taste sweet.',
3: 'We have a box of plums.'}})
df1['Keywords'] = df1['Keywords'].str.split(',')
df2['Category'] = (df2['Summary'].str.split(' ').apply(
lambda x: list(set([str(a) for y in
x for a,b in
zip(df1['Category'], df1['Keywords']) for c in
b if str(c) in #Or you can use: "if str(c) == str(y)" or "if str(c).lower() == str(y).lower()"
str(y)]))).str.join(', '))
df2
Out[1]:
Summary Category
0 This is a basket of red apples. They are sour. Fruit, Color
1 We found a bushel of fruit. They are red. Color
2 There is a peck of pears that taste sweet. Fruit
3 We have a box of plums. Fruit
a
、b
和x
在行中迭代(垂直)c
和y
迭代行内的列表(水平)。为了开始水平遍历列表,首先需要垂直遍历行。这就是我们拥有所有这些变量的原因(见图)。您可以使用zip
同时迭代第一个数据帧的两列或多列
感谢David Erickson的快速回复。我得到了以下错误:“'float'对象不可编辑”我很确定我的数据中没有任何浮点值。我已经用另一个版本的first dataframe更新了我的问题,以使事情更简单。@Brussel我在所有数据前面都加了str。我猜你的句子栏里有一个整数。错误就在我这边。我纠正了浮动问题。非常感谢你的解决方案,大卫。你让我开心。如果你能为你的解决方案提供更多的解释,那么任何有类似问题的人都可以根据他们的问题来定制你的解决方案,这将非常有帮助。我还看到你的解决方案匹配子字符串,一个完整的单词匹配会很棒。非常感谢你的努力@David。你用图片解释清楚了一切。