Python 如果DataFrame多个子字符串匹配,也会将行的特定匹配子字符串放入新列中

Python 如果DataFrame多个子字符串匹配,也会将行的特定匹配子字符串放入新列中,python,pandas,substring,match,string-matching,Python,Pandas,Substring,Match,String Matching,我正试图从一份调查回复中提取一些记录。所有这些记录都需要至少包含一些关键字中的一个。例如: 现在我有了一个数据帧df: svy_rspns_txt I like it I hate it It's a scam It's shaddy Scam! Good service Very disappointed 现在如果我跑 kw="hate,scam,shaddy,disappoint" sensitive_words=[unicode(x,'unicode-escape') for x in

我正试图从一份调查回复中提取一些记录。所有这些记录都需要至少包含一些关键字中的一个。例如: 现在我有了一个数据帧df:

svy_rspns_txt
I like it
I hate it
It's a scam
It's shaddy
Scam!
Good service
Very disappointed
现在如果我跑

kw="hate,scam,shaddy,disappoint"
sensitive_words=[unicode(x,'unicode-escape') for x in kw.lower().split(",")]
df=df[df["svy_rspns_txt"].astype('unicode').str.contains('|'.join(sensitive_words),case=False,na=False)]
我会得到这样的结果

svy_rspns_txt
I hate it
It's a scam
It's shaddy
Scam!
Very disappointed
现在,我如何添加一列“matched_word”来显示匹配的确切字符串,以便得到如下结果:

svy_rspns_txt            matched_word
I hate it                hate
It's a scam              scam
It's shaddy              shaddy
Scam!                    scam
Very disappointed        disappoint

将生成器表达式与
next
一起使用:

df = pd.DataFrame({'text': ["I like it", "I hate it", "It's a scam", "It's shaddy",
                            "Scam!", "Good service", "Very disappointed"]})

kw = "hate,scam,shaddy,disappoint"

words = set(kw.split(','))

df['match'] = df['text'].apply(lambda x: next((i for i in words if i in x.lower()), np.nan))

print(df)

                text       match
0          I like it         NaN
1          I hate it        hate
2        It's a scam        scam
3        It's shaddy      shaddy
4              Scam!        scam
5       Good service         NaN
6  Very disappointed  disappoint
您可以通过或注意
NaN!=南

res = df[df['match'].notnull()]
# or, res = df[df['match'].notna()]
# or, res = df[df['match'] == df['match']]

print(res)

                text       match
1          I hate it        hate
2        It's a scam        scam
3        It's shaddy      shaddy
4              Scam!        scam
6  Very disappointed  disappoint

如果多个单词匹配怎么办?
matched_word
列应该显示所有单词,还是只显示匹配的第一个单词?@TimJohns现在我只需要显示第一个单词。但是如果你能给我一些建议来显示所有匹配的单词,那就太好了。非常感谢。你能不能只做
res=df[df['match'].notna()]
,让它更清楚?@TimJohns,是的,那也行,更清楚,谢谢。我通常使用
pd.Series.notnull