Python 筛选系列中的特定单词(带变体)
我有一个大型数据框,其中一列中有一个单词的几个单词变体。我想根据我要查找的特定单词筛选行。下面是一个示例数据帧。在这里,我想过滤在“Resolution”列中有单词“create”的行,而不是它的子字符串,如“re-create”或“recreate” 注意:我只想在Python 筛选系列中的特定单词(带变体),python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个大型数据框,其中一列中有一个单词的几个单词变体。我想根据我要查找的特定单词筛选行。下面是一个示例数据帧。在这里,我想过滤在“Resolution”列中有单词“create”的行,而不是它的子字符串,如“re-create”或“recreate” 注意:我只想在str.contains In [4]: df = pd.DataFrame({"Resolution":["create profile", "recreate profile", "re-create profile", "cr
str.contains
In [4]: df = pd.DataFrame({"Resolution":["create profile", "recreate profile", "re-create profile", "created profile",
...: "re-created profile", "closed outlook and recreated profile", "purged outlook processes and created new profile
...: "], "Product":["Outlook", "Outlook", "Outlook", "Outlook", "Outlook", "Outlook", "Outlook"]})
In [5]: df
Out[5]:
Resolution Product
0 create profile Outlook
1 recreate profile Outlook
2 re-create profile Outlook
3 created profile Outlook
4 re-created profile Outlook
5 closed outlook and recreated profile Outlook
6 purged outlook processes and created new profile Outlook
我的尝试:
我已经能够过滤“重新创建”和“重新创建”(过去时无关紧要):
问题:如何修改正则表达式,使其仅获取带有“create”的行,而不获取子字符串?大概是这样的:
Resolution Product
0 create profile Outlook
3 created profile Outlook
6 purged outlook processes and created new profile Outlook
为反转条件添加
~
:
df = df[~df.Resolution.str.contains("(?=.*recreate|re-create)(?=.*profile)")]
print (df)
Resolution Product
0 create profile Outlook
3 created profile Outlook
6 purged outlook processes and created new profile Outlook
你说的“保持个人资料”是什么意思?问题中的正则表达式将只删除同时具有“重新创建/重新创建”和“配置文件”的行。如果您说您想通过重新创建/重新创建删除行,但仅当它们不包含概要文件时,那么您需要更改正则表达式
df = df[~df.Resolution.str.contains("(?=.*recreate|re-create)(?=.*profile)")]
print (df)
Resolution Product
0 create profile Outlook
3 created profile Outlook
6 purged outlook processes and created new profile Outlook