Python 在Pandas中的多个列中检查NaN

Python 在Pandas中的多个列中检查NaN,python,pandas,dataframe,nan,Python,Pandas,Dataframe,Nan,我想根据给定列是否包含NaN向数据帧添加一个二进制列 我试着用下面的代码来做 import pandas as pd dat = pd.DataFrame({'A': [12,34,56,78, 23,None, None], 'B': [90,80,70,23,None, 78, None], 'C': [90,80,70,23,None, 78, None], 'D': [12,34,56,78, 23,None, None]}) dat['A1'] = dat['A'].isnull()

我想根据给定列是否包含NaN向数据帧添加一个二进制列

我试着用下面的代码来做

import pandas as pd

dat = pd.DataFrame({'A': [12,34,56,78, 23,None, None], 'B': [90,80,70,23,None, 78, None], 'C': [90,80,70,23,None, 78, None], 'D': [12,34,56,78, 23,None, None]})
dat['A1'] = dat['A'].isnull()
dat['B1'] = dat['B'].isnull()
dat['C1'] = dat['C'].isnull()
dat['ismissing'] = 1 if dat['A1'] == True and dat['B1'] == True and dat['C1'] == True else 0
dat
但我前天在电话里有个错误

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
样本输入:

A     B     C     D
10   NaN    40    NaN
NaN  NaN    80    90
20    45    NaN   89
NaN  NaN    NaN   46
预期产出:

A     B     C     D     E
10   NaN    40    NaN   0
NaN  NaN    80    90    0
20    45    NaN   89    0
NaN  NaN    NaN   46    1

我只想检查A、B、C列的NaN。

注意,
需要一个布尔变量,而
pd.Series
不是。这就是为什么python抱怨它不知道如何将
pd.Series
转换为布尔值

相反,你可以(也应该)做:


您要检查具有列(
a、B、C
)的行是否具有全部
nan

您可以使用以下方法执行此操作:

性能比较:

广亨的回答是:

In [1720]: %timeit df['ismissing'] = df[['A','B','C']].isna().all(axis=1)
989 µs ± 70 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [1719]: %timeit df['New']=~df.index.isin(df.drop('D',1).dropna(thresh=1).index)
2.05 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [1724]: %timeit df['all_nan'] = df[['A','B','C']].count(axis=1).eq(0).view('i1')
1.48 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
尤本尤的回答是:

In [1720]: %timeit df['ismissing'] = df[['A','B','C']].isna().all(axis=1)
989 µs ± 70 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [1719]: %timeit df['New']=~df.index.isin(df.drop('D',1).dropna(thresh=1).index)
2.05 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [1724]: %timeit df['all_nan'] = df[['A','B','C']].count(axis=1).eq(0).view('i1')
1.48 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
安基的回答是:

In [1720]: %timeit df['ismissing'] = df[['A','B','C']].isna().all(axis=1)
989 µs ± 70 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [1719]: %timeit df['New']=~df.index.isin(df.drop('D',1).dropna(thresh=1).index)
2.05 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [1724]: %timeit df['all_nan'] = df[['A','B','C']].count(axis=1).eq(0).view('i1')
1.48 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
我的答覆是:

In [1723]: %timeit dat['E'] = np.where(dat[['A','B','C']].isnull().all(1), 1, 0)
914 µs ± 18.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

如你所见,我的答案是np,其中是最快的。

让我们试试新的

df['New']=~df.index.isin(df.drop('D',1).dropna(thresh=1).index)
df
      A     B     C     D    New
0  10.0   NaN  40.0   NaN  False
1   NaN   NaN  80.0  90.0  False
2  20.0  45.0   NaN  89.0  False
3   NaN   NaN   NaN  46.0   True

我创建了一个包含true和false的列,如果为true,则应用一个,如果为false,则应用0

dat['ismissing'] = dat.isnull().all(axis=1)
dat['ismissing'] = dat['ismissing'].apply(lambda x: 1 if x else 0)

谢谢@anky。我试图寻找一个优化的解决方案。@anky我进一步检查,实际上
isna()
isnull()
在性能上没有太大的差异。主要变化是由于
np.其中