python中string.contains的反转，熊猫_Python_String_Python 2.7_Csv_Pandas

python中string.contains的反转，熊猫

python string python-2.7 csv pandas

python中string.contains的反转，熊猫,python,string,python-2.7,csv,pandas,Python,String,Python 2.7,Csv,Pandas,我的代码中有类似的内容： df2=df[df['A'].str.contains（“Hello | World”）] 但是，我想要所有不包含Hello或World的行。如何才能最有效地扭转这种局面？方法.contains（）使用正则表达式，因此您可以使用来确定不包含单词： df['A'].str.contains(r'^(?:(?!Hello|World).)*$') 此表达式匹配字符串中未找到单词Hello和World的任何字符串演示：您可以使用波浪线~翻转布尔值： >>&g

我的代码中有类似的内容：

df2=df[df['A'].str.contains（“Hello | World”）]

但是，我想要所有不包含Hello或World的行。如何才能最有效地扭转这种局面？

方法

.contains（）

使用正则表达式，因此您可以使用来确定不包含单词：

df['A'].str.contains(r'^(?:(?!Hello|World).)*$')

此表达式匹配字符串中未找到单词

Hello

和

World

的任何字符串

演示：

您可以使用波浪线

翻转布尔值：

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df.A.str.contains("Hello|World")
0     True
1    False
2     True
3    False
Name: A, dtype: bool
>>> ~df.A.str.contains("Hello|World")
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[~df.A.str.contains("Hello|World")]
       A
1   this
3  apple

[2 rows x 1 columns]

这是否是最有效的方法，我不知道；你必须根据你的其他选择来确定时间。有时使用正则表达式比使用

df[~（df.a.str.contains（“Hello”）|（df.a.str.contains（“World”））]

之类的东西要慢，但我不善于猜测交叉点在哪里。

比复杂的负面环视测试好得多。然而，我自己没有熊猫方面的经验，因此我不知道什么是更快的方法。regex环视测试花费的时间明显更长（约30秒vs 20秒），而且这两种方法的结果显然略有不同（3663K结果vs 3504K-来自~3G原始-尚未查看具体信息）@DSM我已经多次看到这个

符号，特别是在JavaScript中。在python中没有见过。这到底是什么意思？我得到了

C:\Python27\lib\site packages\pandas\core\strings.py:176:UserWarning:此模式具有匹配组。要实际获取组，请使用str.extract。

。使组不可捕获。

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df.A.str.contains("Hello|World")
0     True
1    False
2     True
3    False
Name: A, dtype: bool
>>> ~df.A.str.contains("Hello|World")
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[~df.A.str.contains("Hello|World")]
       A
1   this
3  apple

[2 rows x 1 columns]