Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/360.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 筛选系列中的特定单词(带变体)_Python_Python 3.x_Pandas - Fatal编程技术网

Python 筛选系列中的特定单词(带变体)

Python 筛选系列中的特定单词(带变体),python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个大型数据框,其中一列中有一个单词的几个单词变体。我想根据我要查找的特定单词筛选行。下面是一个示例数据帧。在这里,我想过滤在“Resolution”列中有单词“create”的行,而不是它的子字符串,如“re-create”或“recreate” 注意:我只想在str.contains In [4]: df = pd.DataFrame({"Resolution":["create profile", "recreate profile", "re-create profile", "cr

我有一个大型数据框,其中一列中有一个单词的几个单词变体。我想根据我要查找的特定单词筛选行。下面是一个示例数据帧。在这里,我想过滤在“Resolution”列中有单词“create”的行,而不是它的子字符串,如“re-create”或“recreate”

注意:我只想在
str.contains

In [4]: df = pd.DataFrame({"Resolution":["create profile", "recreate profile", "re-create profile", "created profile",
   ...: "re-created profile", "closed outlook and recreated profile", "purged outlook processes and created new profile
   ...: "], "Product":["Outlook", "Outlook", "Outlook", "Outlook", "Outlook", "Outlook", "Outlook"]})

In [5]: df
Out[5]:
                                         Resolution  Product
0                                    create profile  Outlook
1                                  recreate profile  Outlook
2                                 re-create profile  Outlook
3                                   created profile  Outlook
4                                re-created profile  Outlook
5              closed outlook and recreated profile  Outlook
6  purged outlook processes and created new profile  Outlook
我的尝试:

我已经能够过滤“重新创建”和“重新创建”(过去时无关紧要):

问题:如何修改正则表达式,使其仅获取带有“create”的行,而不获取子字符串?大概是这样的:

                                      Resolution  Product
0                                    create profile  Outlook
3                                   created profile  Outlook
6  purged outlook processes and created new profile  Outlook

为反转条件添加
~

df = df[~df.Resolution.str.contains("(?=.*recreate|re-create)(?=.*profile)")]
print (df)
                                          Resolution  Product
0                                     create profile  Outlook
3                                    created profile  Outlook
6  purged outlook processes and created new profile   Outlook
你说的“保持个人资料”是什么意思?问题中的正则表达式将只删除同时具有“重新创建/重新创建”和“配置文件”的行。如果您说您想通过重新创建/重新创建删除行,但仅当它们不包含概要文件时,那么您需要更改正则表达式
df = df[~df.Resolution.str.contains("(?=.*recreate|re-create)(?=.*profile)")]
print (df)
                                          Resolution  Product
0                                     create profile  Outlook
3                                    created profile  Outlook
6  purged outlook processes and created new profile   Outlook