Python 熊猫的规则是什么；具有相同长度元素并具有不同索引的元素级二进制布尔操作数？_Python_Pandas

Python 熊猫的规则是什么；具有相同长度元素并具有不同索引的元素级二进制布尔操作数？

python pandas

Python 熊猫的规则是什么；具有相同长度元素并具有不同索引的元素级二进制布尔操作数？,python,pandas,Python,Pandas,我在我的代码库中应用了一些二进制布尔运算符，遇到了一个让我非常惊讶的bug。我重建了一个最小的工作示例来演示下面的行为 import pandas s = pandas.Series( [True]*4 ) d = pandas.DataFrame( { 'a':[True, False, True, False] , 'b':[True]*4 } ) print(d) a b 0 True True 1 False True 2 True True 3

我在我的代码库中应用了一些二进制布尔运算符，遇到了一个让我非常惊讶的bug。我重建了一个最小的工作示例来演示下面的行为

import pandas
s = pandas.Series( [True]*4 )
d = pandas.DataFrame( { 'a':[True, False, True, False] , 'b':[True]*4 } )

print(d)
       a     b
0   True  True
1  False  True
2   True  True
3  False  True

print( s[0:2] )
0    True
1    True
dtype: bool

print( d.loc[ d['a'] , 'b' ] )
0    True
2    True
dtype: bool

print( s[0:2] & d.loc[ d['a'] , 'b' ] )
0     True
1    False
2    False

最后一句话的价值完全出乎我的意料，它包含了3个元素。意识到这里指数的影响，我手动重置指数以产生我预期的结果

s[0:2].reset_index(drop=True) & d.loc[ d['a'] , 'b' ].reset_index( drop=True )
0    True
1    True

不用说，我需要重新查看文档并掌握索引规则在这里的应用。有人能一步一步地解释这个操作符在混合索引中的行为吗

=============================================

为了给那些来自相似R背景的人添加比较，R的

data.frame

等效操作产生了我所期望的结果

> a = c(TRUE,FALSE,TRUE,FALSE)
> b = c(TRUE,TRUE,TRUE,TRUE)
> 
> d = data.frame( a, b )
> d
      a    b
1  TRUE TRUE
2 FALSE TRUE
3  TRUE TRUE
4 FALSE TRUE
> s = c( TRUE,TRUE,TRUE,TRUE)
> s
[1] TRUE TRUE TRUE TRUE
>
> d[ d$a , 'b']
[1] TRUE TRUE
>
> s[0:2]
[1] TRUE TRUE
> s[0:2] & d[ d$a , 'b']
[1] TRUE TRUE

您正在比较具有不同索引的两个系列

s[0:2]

0    True
1    True
dtype: bool

及

pandas

需要对齐索引，然后进行比较

s[0:2] & d.loc[ d['a'] , 'b']

0     True  # True from both indices therefore True
1    False  # Only True from s[0:2] and missing from other therefore False
2    False  # Only True from d and missing from other therefore False
dtype: bool

哇，这个索引方案是一个全新的游戏，来自R严格的元素操作。我将在我的问题中结合一些R示例来提供一些比较。感谢您提供的基本技巧。您仍然可以使用基础numpy数组（在其中一个系列/数据帧上调用

.values

属性）执行元素操作。哇，索引对齐是一个强大的功能。。。在这里找到了一个很好的演示和解释。如果我没有弄错的话，在R中没有类似的东西，我想R's会在fella给出的例子中插入一个子集操作，它应用于一个更大的集合，在这个集合中，索引对齐第一次出现，NaN's被填充到其他地方。@ayhan感谢这个好的策略。我不会在我的例子中使用它，因为我将取而代之的是所有独立的

系列

和

数据帧

构造函数，它们与它们派生的其他

系列

和

数据帧

具有相同的长度。这一次，我将构建它们以共享其父熊猫对象的相同索引。这些二进制操作在很多地方都可以找到，我需要将这些东西从一开始就与所有这些助手panda的对象对齐。这一发现应该可以修复几个我无法理解的神秘bug。

s[0:2] & d.loc[ d['a'] , 'b']

0     True  # True from both indices therefore True
1    False  # Only True from s[0:2] and missing from other therefore False
2    False  # Only True from d and missing from other therefore False
dtype: bool