检查数据框中的列是否包含列表中的任何单词+；添加计数（Python）_Python_Dataframe

检查数据框中的列是否包含列表中的任何单词+；添加计数（Python）

python dataframe

检查数据框中的列是否包含列表中的任何单词+；添加计数（Python）,python,dataframe,Python,Dataframe,这是输入数据框 df_data = pd.DataFrame({'A':[2,1,3], 'content': ['the dog is sleeping', 'my name is Dude', 'i am who i am']}) 和字表 words_list= ['dog', 'Dude','sleeping', 'i'] 现在，我知道如何创建一个新的列，如果我有我想要的单词，比如- df_data['new'] = df_data.apply(lambda row: True if

这是输入数据框

df_data = pd.DataFrame({'A':[2,1,3], 'content': ['the dog is sleeping', 'my name is Dude', 'i am who i am']})

和字表

words_list= ['dog', 'Dude','sleeping', 'i']

现在，我知道如何创建一个新的列，如果我有我想要的单词，比如-

df_data['new'] = df_data.apply(lambda row: True if any([item in row['content'] for item in words_list]) else False, axis = 1)

关键是我还想数一数这些词。。。例如，在第2行和第3行中，我的列表中有两个单词，所以我希望有一个值为2的新列，以此类推

谢谢大家!

尝试此操作，以提取匹配项

import pandas as pd
import re

df_data = pd.DataFrame({'A':[2,1,3], 'content': ['the dog is sleeping', 'my name is Dude', 'i am who i am']})
words_list= ['dog', 'Dude','sleeping', 'i']

search_ = re.compile("\\b%s\\b" % "\\b|\\b".join(words_list))

df_data['matches'] = df_data.content.str.findall(search_)
df_data['count'] = df_data['matches'].apply(len)

首先，我认为您需要修改初始函数，因为它可能会提供不正确的输出

例如：

words_list= ['do']
df_data['new'] = df_data.apply(lambda row: True if any([item in row['content'] for item in words_list]) else False, axis = 1)

导致

   A              content    new
0  2  the dog is sleeping   True
1  1      my name is Dude  False
2  3        i am who i am  False

我想，第一行没有“做”这个词。可以通过将行内容拆分为列表来修复：

row['content'].split()

使用布尔数组上的sum函数可以简单地设置计数：

df_data['new'] = df_data.apply(lambda row: sum([item in row['content'].split() for item in words_list]), axis = 1)

输出：

   A              content  new
0  2  the dog is sleeping    2
1  1      my name is Dude    1
2  3        i am who i am    1

   A              content  new
0  2  the dog is sleeping    2
1  1      my name is Dude    1
2  3        i am who i am    1