Python-删除列表中未包含的所有子字符串_Python_String_Pandas

Python-删除列表中未包含的所有子字符串

python string pandas

Python-删除列表中未包含的所有子字符串,python,string,pandas,Python,String,Pandas,我想删除df列中定义列表中不存在的所有子字符串。例如： mylist = {good, like, bad, hated, terrible, liked} Current: Desired: index content index content

我想删除df列中定义列表中不存在的所有子字符串。例如：

mylist = {good, like, bad, hated, terrible, liked}

Current:                                         Desired:
index      content                               index        content                                          
0          a very good idea, I like it           0            good like
1          was the bad thing to do               1            bad
2          I hated it, it was terrible           2            hated terrible
...                                              ...
k          Why do you think she liked it         k            liked

我已经设法定义了一个函数，它可以保留列表中没有的所有单词，但是我不知道如何反转此函数以实现我想要的：

pat = r'\b(?:{})\b'.format('|'.join(mylist))
df['column1'] = df['column1'].str.contains(pat, '')

如果您有任何帮助，我们将不胜感激。

请配合使用：

或使用拆分、筛选和联接进行列表理解：

df['column1'] = df['content'].apply(lambda x: ' '.join([y for y in x.split() if y in mylist]))
print (df)
                         content         column1
0    a very good idea, I like it       good like
1        was the bad thing to do             bad
2    I hated it, it was terrible  hated terrible
3  Why do you think she liked it           liked

df['column1'] = df['content'].apply(lambda x: ' '.join([y for y in x.split() if y in mylist]))
print (df)
                         content         column1
0    a very good idea, I like it       good like
1        was the bad thing to do             bad
2    I hated it, it was terrible  hated terrible
3  Why do you think she liked it           liked