Python Pandas中布尔索引的逻辑运算符_Python_Pandas_Dataframe_Boolean_Filtering

Python Pandas中布尔索引的逻辑运算符

python pandas dataframe

Python Pandas中布尔索引的逻辑运算符,python,pandas,dataframe,boolean,filtering,Python,Pandas,Dataframe,Boolean,Filtering,我正在熊猫中使用布尔索引。问题是为什么会有这样的声明： a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)] 很好，但是 a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)] 是否有错误退出例如： a=pd.DataFrame({'x':[1,1],'y':[10

我正在熊猫中使用布尔索引。问题是为什么会有这样的声明：

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

很好，但是

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

是否有错误退出

例如：

a=pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

当你说

(a['x']==1) and (a['y']==10)

您隐式地要求Python将

（a['x']==1）

和

（a['y']==10）

转换为布尔值

NumPy数组（长度大于1）和Pandas对象（如Series）没有布尔值——换句话说，它们提高了

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

当用作布尔值时。那是因为它是。如果长度不为零（如Python列表），一些用户可能会认为它们是真的。其他人可能只希望它的所有元素都是真的。其他人可能希望它是真的，如果它的任何元素都是真的

因为有太多相互冲突的期望，NumPy和Pandas的设计师拒绝猜测，反而提出了一个错误

相反，您必须是显式的，通过调用

empty（）

、

all（）

或

any（）

方法来指示您想要的行为

但是，在本例中，看起来您不需要布尔求值，而是需要元素方面的逻辑and。这就是

二进制运算符执行的操作：

(a['x']==1) & (a['y']==10)

返回一个布尔数组

顺便说一下，括号是必需的，因为

的值高于

。如果没有括号，

a['x']==1&a['y']==10

将被评估为

a['x']==（1&a['y']）==10

，这将反过来等效于链式比较

（a['x']=（1&a['y']）和（（1&a['y']）==10）

。这是

系列和系列

形式的表达式。对两个系列使用

和

将再次触发与上述相同的

值错误

。这就是为什么括号是强制性的。

TLDR；Pandas中的逻辑运算符是

，

和

，括号

（…）

很重要！ Python的

和

、

或

和

非

逻辑运算符设计用于处理标量。因此，Pandas必须做得更好，并重写按位运算符以实现此功能的矢量化（按元素）版本

因此，python中的以下表达式（

exp1

和

exp2

是计算布尔结果的表达式）

…将转换为

exp1 & exp2                # Element-wise logical AND
exp1 | exp2                # Element-wise logical OR
~exp1                      # Element-wise logical NOT

为了熊猫

如果在执行逻辑运算的过程中出现

ValueError

，则需要使用括号进行分组：

(exp1) op (exp2)

np.logical_and(df['A'] < 5, df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
Name: A, dtype: bool

df[np.logical_and(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

比如说,

(df['col1'] == x) & (df['col2'] == y)

等等

：常用操作是通过逻辑条件计算布尔掩码以过滤数据。Pandas提供了三个运算符：
&
用于逻辑AND，
用于逻辑OR，而
~
用于逻辑NOT
请考虑以下设置：

np.random.seed(0) df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC')) df A B C 0 5 0 3 1 3 7 9 2 3 5 2 3 4 7 6 4 8 8 1
逻辑与 对于上面的
df
，假设您希望返回A<5和B>5的所有行。这是通过分别计算每个条件的掩码，并对它们进行求和来实现的
重载按位
和运算符在继续之前，请注意文档的这一特定摘录，其中说明另一种常见的操作是使用布尔向量来过滤数据。运算符为：| 用于或，和用于和，~ 用于非这些必须使用括号进行分组，因为默认情况下Python将将表达式（如df.A>2和df.B<3 计算为df.A>（2& df.B）<3 ，而所需的评估顺序是（df.A>2）和（df.B< 3）因此，考虑到这一点，可以使用位运算符和实现元素逻辑AND： df['A'] < 5 0 False 1 True 2 True 3 True 4 False Name: A, dtype: bool df['B'] > 5 0 False 1 True 2 False 3 True 4 True Name: B, dtype: bool 它被解析为 df['A'] < (5 & df['B']) > 5 所以，不要犯那个错误！一, 避免括号分组解决方法其实很简单。对于数据帧，大多数操作符都有相应的绑定方法。如果单个掩码是使用函数而不是条件运算符构建的，则不再需要按参数分组以指定求值顺序： df['A'].lt(5) 0 True 1 True 2 True 3 True 4 False Name: A, dtype: bool df['B'].gt(5) 0 False 1 True 2 False 3 True 4 True Name: B, dtype: bool 请参阅上的一节。总而言之，我们有 ╒════╤════════════╤════════════╕ │ │ Operator │ Function │ ╞════╪════════════╪════════════╡ │ 0 │ > │ gt │ ├────┼────────────┼────────────┤ │ 1 │ >= │ ge │ ├────┼────────────┼────────────┤ │ 2 │ < │ lt │ ├────┼────────────┼────────────┤ │ 3 │ <= │ le │ ├────┼────────────┼────────────┤ │ 4 │ == │ eq │ ├────┼────────────┼────────────┤ │ 5 │ != │ ne │ ╘════╧════════════╧════════════╛ 我在中大量记录了query 和eval 允许您以功能性方式执行此操作。在内部调用与按位运算符相对应的系列._和_ import operator operator.and_(df['A'] < 5, df['B'] > 5) # Same as, # (df['A'] < 5).__and__(df['B'] > 5) 0 False 1 True 2 False 3 True 4 False dtype: bool df[operator.and_(df['A'] < 5, df['B'] > 5)] A B C 1 3 7 9 3 4 7 6 np.logical_和是一个，大多数UFUNC都有一个方法。这意味着，如果您有多个用于和的掩码，则使用logical_和更容易概括。例如，要使用和屏蔽m1 和m2 和m3 ，您必须执行以下操作 m1 & m2 & m3 然而，更简单的选择是 np.logical_and.reduce([m1, m2, m3]) 这是非常强大的，因为它允许您在此基础上构建更复杂的逻辑（例如，在列表理解中动态生成掩码并添加所有掩码）：我知道我在反复强调这一点，但请容忍我。这是一个非常非常常见的初学者错误，必须非常彻底地解释逻辑或对于上面的df ，假设您希望返回A==3或B==7的所有行按位重载| df['A'] == 3 0 False 1 True 2 True 3 False 4 False Name: A, dtype: bool df['B'] == 7 0 False 1 True 2 False 3 True 4 False Name: B, dtype: bool 如果您还没有，请同时阅读上面关于逻辑和的部分，所有注意事项适用于此处或者，此操作可以使用指定 df[df['A'].eq(3) | df['B'].eq(7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 呼叫系列。_或_ operator.or_(df['A'] == 3, df['B'] == 7) # Same as, # (df['A'] == 3).__or__(df['B'] == 7) 0 False 1 True 2 True 3 True 4 False dtype: bool df[operator.or_(df['A'] == 3, df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 对于两种情况，请使用逻辑\u或： np.logical_or(df['A'] == 3, df['B'] == 7) 0 False 1 True 2 True 3 True 4 False Name: A, dtype: bool df[np.logical_or(df['A'] == 3, df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 对于多个掩码，请使用逻辑\u或.reduce ： np.logical_or.reduce([df['A'] == 3, df['B'] == 7]) # array([False, True, True, True, False]) df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])] A B C 1 3 7 9 2 3 5 2 3 4 7 6 逻辑非给一个面具，例如 mask = pd.Series([True, True, False]) 如果你需要反转eve import operator operator.and_(df['A'] < 5, df['B'] > 5) # Same as, # (df['A'] < 5).__and__(df['B'] > 5) 0 False 1 True 2 False 3 True 4 False dtype: bool df[operator.and_(df['A'] < 5, df['B'] > 5)] A B C 1 3 7 9 3 4 7 6 np.logical_and(df['A'] < 5, df['B'] > 5) 0 False 1 True 2 False 3 True 4 False Name: A, dtype: bool df[np.logical_and(df['A'] < 5, df['B'] > 5)] A B C 1 3 7 9 3 4 7 6 m1 & m2 & m3 np.logical_and.reduce([m1, m2, m3]) import operator cols = ['A', 'B'] ops = [np.less, np.greater] values = [5, 5] m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)]) m # array([False, True, False, True, False]) df[m] A B C 1 3 7 9 3 4 7 6 df['A'] == 3 0 False 1 True 2 True 3 False 4 False Name: A, dtype: bool df['B'] == 7 0 False 1 True 2 False 3 True 4 False Name: B, dtype: bool (df['A'] == 3) | (df['B'] == 7) 0 False 1 True 2 True 3 True 4 False dtype: bool df[(df['A'] == 3) | (df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 df[df['A'].eq(3) | df['B'].eq(7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 operator.or_(df['A'] == 3, df['B'] == 7) # Same as, # (df['A'] == 3).__or__(df['B'] == 7) 0 False 1 True 2 True 3 True 4 False dtype: bool df[operator.or_(df['A'] == 3, df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 np.logical_or(df['A'] == 3, df['B'] == 7) 0 False 1 True 2 True 3 True 4 False Name: A, dtype: bool df[np.logical_or(df['A'] == 3, df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 np.logical_or.reduce([df['A'] == 3, df['B'] == 7]) # array([False, True, True, True, False]) df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])] A B C 1 3 7 9 2 3 5 2 3 4 7 6 mask = pd.Series([True, True, False]) ~mask 0 False 1 False 2 True dtype: bool ~(df['A'] == 3) 0 True 1 False 2 False 3 True 4 True Name: A, dtype: bool mask.__invert__() 0 False 1 False 2 True dtype: bool operator.inv(mask) 0 False 1 False 2 True dtype: bool np.logical_not(mask) 0 False 1 False 2 True dtype: bool >>> import numpy as np >>> import pandas as pd >>> arr = np.array([1,2,3]) >>> s = pd.Series([1,2,3]) >>> df = pd.DataFrame([1,2,3]) >>> bool(arr) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() >>> bool(s) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). >>> bool(df) ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). np.logical_and(df1, df2) np.logical_or(df1, df2) np.logical_not(df1) np.logical_xor(df1, df2) (df1 < 10) | (df2 > 10) # instead of the wrong df1 < 10 | df2 > 10 >>> import numpy as np >>> a1 = np.array([0, 0, 1, 1]) >>> a2 = np.array([0, 1, 0, 1]) >>> np.logical_and(a1, a2) array([False, False, False, True]) >>> np.bitwise_and(a1, a2) array([0, 0, 0, 1], dtype=int32) >>> a3 = np.array([1, 2, 3, 4]) >>> a3[np.logical_and(a1, a2)] array([4]) >>> a3[np.bitwise_and(a1, a2)] array([1, 1, 1, 2])