Python 3.x python中的非匹配词删除_Python 3.x_Pandas

Python 3.x python中的非匹配词删除

python-3.x pandas

Python 3.x python中的非匹配词删除,python-3.x,pandas,Python 3.x,Pandas,我有一个基于文本的字符串，只想保留特定的单词 sample = "This is a test text. Test text should pass the test" approved_list = ["test", "text"] 预期输出： "test text Test text test" 我已经阅读了很多基于regex的答案，不幸的是它们没有解决这个具体问题该解决方案也可以扩展到熊猫系列吗？您不需要pandas。使用正则表达式模块re import re re.findal

我有一个基于文本的字符串，只想保留特定的单词

sample = "This is a test text. Test text should pass the test"
approved_list = ["test", "text"]

预期输出：

"test text Test text test"

我已经阅读了很多基于

regex

的答案，不幸的是它们没有解决这个具体问题

该解决方案也可以扩展到熊猫系列吗？

您不需要

pandas

。使用正则表达式模块

re

import re

re.findall('|'.join(approved_list), sample, re.IGNORECASE)

['test', 'text', 'Test', 'text', 'test']

如果您有一个

pd.系列

sample = pd.Series(["This is a test text. Test text should pass the test"] * 5)
approved_list = ["test", "text"]

使用

str

字符串访问器

sample.str.findall('|'.join(approved_list), re.IGNORECASE)

0    [test, text, Test, text, test]
1    [test, text, Test, text, test]
2    [test, text, Test, text, test]
3    [test, text, Test, text, test]
4    [test, text, Test, text, test]
dtype: object

谢谢，这很有帮助。我之所以提到熊猫，是因为

approved_列表

需要应用于

pd.Series

的每个值。你有什么建议吗？@Drj更新了我的帖子。