Python 为什么“is not None”不能与dataframe.loc一起使用，但是“！=None”可以正常使用？_Python_Pandas

Python 为什么“is not None”不能与dataframe.loc一起使用，但是“！=None”可以正常使用？

python pandas

Python 为什么“is not None”不能与dataframe.loc一起使用，但是“！=None”可以正常使用？,python,pandas,Python,Pandas,我目前正在玩Pandas dataframe，我想选择dataframe中没有None实体属性的所有数据项 df_u=df.loc[df[‘实体’！=None] 看起来效果不错但是 df=df.loc[df['entities']不是无] 将引发键错误那是文件pandas\u libs\index.pyx，第107行，在>pandas.\u libs.index.IndexEngine.get\u loc 文件pandas\u libs\index.pyx，第128行，在>pandas.\

我目前正在玩Pandas dataframe，我想选择dataframe中没有None实体属性的所有数据项

df_u=df.loc[df[‘实体’！=None] 看起来效果不错但是

df=df.loc[df['entities']不是无] 将引发键错误那是

文件pandas\u libs\index.pyx，第107行，在>pandas.\u libs.index.IndexEngine.get\u loc 文件pandas\u libs\index.pyx，第128行，在>pandas.\u libs.index.IndexEngine.get\u loc 文件pandas\libs\index\u class\u helper.pxi，第91行，在>pandas.\u libs.index.Int64Engine.\u检查\u类型 KeyError:正确

我已经知道了我原来问题的解决方案，我只是好奇为什么会发生这种情况而不是

df_ = df.loc[df['entities'] is not None]

或

你应该使用

df_ = df.loc[df['entities'].isna()]

因为pandas中缺失值的表示不同于python通常使用None表示缺失值的方式。特别是，您会得到键错误，因为列系列df['entities']被检查为标识为None。这在任何情况下都计算为True，因为序列不是None。然后.loc在行索引中搜索True，该值在您的案例中不存在，因此会引发异常。！=不会引发此异常，因为pandas.Series重载了相等运算符，否则无法通过将列与df['name']=='Miller'中的固定值进行比较来构建索引器。此重载方法执行元素级比较，并返回一个与.loc方法配合良好的索引器。只是结果可能不是你想要的

如果你这样做了

import pandas as pd
import numpy as np
df= pd.DataFrame(dict(x=[1,2,3], y=list('abc'), nulls= [None, np.NaN, np.float32('inf')]))

df['nulls'].isna()

它返回：

Out[18]: 
0     True
1     True
2    False
Name: nulls, dtype: bool

但守则：

df['nulls'] == None

Out[20]: 
0    False
1    False
2    False
Name: nulls, dtype: bool

如果查看存储在列中的对象的数据类型，您会发现它们都是浮点数：

df['nulls'].map(type)
Out[19]: 
0    <class 'float'>
1    <class 'float'>
2    <class 'float'>
Name: nulls, dtype: object

因此，使用isna而不是！=没有一种方法可以帮助您保持代码干净，不处理pandas的内部数据表示。

我在这里有些冒险，因为我没有pandas方面的经验，但在Python方面

Panda通过[]的神奇过滤在很大程度上基于运算符重载。在此表达式中：

df.loc[df['entities'] != None]

df['entities']是实现该方法的对象。这意味着你基本上在做：

df.loc[df['entities'].__ne__(None)]

df['entities'.\uu ne\uu\u None正在生成一些新的魔法条件对象。loc对象实现了重载[]下标语法的方法，因此整个过程本质上是：

df.loc.__getitem__(df['entities'].__ne__(None))

另一方面，不可重载。没有对象可以实现的_是_方法，因此df['entities']不是None，正如Python的核心规则一样进行计算，并且由于df['entities']可能真的不是None，因此该表达式的结果是真的。所以只要：

df.loc.__getitem__(True)

这就是错误消息抱怨KeyError的原因：True。

df['entities']不是None将整个系列与None进行比较，得到True，其中df['entities']！=None是一种元素级比较，产生系列的布尔屏蔽。第二行基本上是执行df.loc[True]，除非您有一个行索引True而不是str而是bool，否则它将绑定到一个keyrerror。

df.loc.__getitem__(df['entities'].__ne__(None))

df.loc.__getitem__(True)