Python 熊猫用nan计算2列的不同组合_Python_Pandas_Dataframe

Python 熊猫用nan计算2列的不同组合

python pandas dataframe

Python 熊猫用nan计算2列的不同组合,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个类似于 df = pd.DataFrame({'A': [1, np.nan,2,3, np.nan,4], 'B': [np.nan, 1,np.nan,2, 3, np.nan]}) df A B 0 1.0 NaN 1 NaN 1.0 2 2.0 NaN 3 3.0 2.0 4 NaN 3.0 5 4.0 NaN 如何计算A isnp.nan但B notnp.nan，A notnp.nan但B isnp.nan，以及A和B都不是np.nan

我有一个类似于

df = pd.DataFrame({'A': [1, np.nan,2,3, np.nan,4], 'B': [np.nan, 1,np.nan,2, 3, np.nan]})
df
     A    B
0  1.0  NaN
1  NaN  1.0
2  2.0  NaN
3  3.0  2.0
4  NaN  3.0
5  4.0  NaN

如何计算A is

np.nan

但B not

np.nan

，A not

np.nan

但B is

np.nan

，以及A和B都不是

np.nan

我尝试了

df.groupby（['A'，'B']）.count（）

，但它没有读取带有

np.nan

的行，我想您需要：

df = pd.DataFrame({'A': [1, np.nan,2,3, np.nan,4], 'B': [np.nan, 1,np.nan,2, 3, np.nan]})

count1 = len(df[(~df['A'].isnull()) & (df['B'].isnull())])
count2 = len(df[(~df['A'].isnull()) & (~df['B'].isnull())])
count3 = len(df[(df['A'].isnull()) & (~df['B'].isnull())])

print(count1, count2, count3)

输出：

3 1 2

如果我们只处理两列，有一个非常简单的解决方案，将简单的权重分配给a列和B列，然后对它们求和

v = df.isna().mul([1, 2]).sum(1).value_counts() 
v.index = v.index.map({2: 'only B', 1: 'only A', 0: 'neither'})    
v

only B     3
only A     2
neither    1
dtype: int64

另一个具有

pivot_table

和

stack

的替代方案可以通过以下方式实现：

df.isna().pivot_table(index='A', columns='B', aggfunc='size').stack()

A      B    
False  False    1.0
       True     3.0
True   False    2.0
dtype: float64

使用

您可以与一起使用以计算Trues值：

df1 = df.isna()
df2 = pd.crosstab(df1.A, df1.B)
print (df2)
B      False  True 
A                  
False      1      3
True       2      0

对于标量：

print (df2.loc[False, False])
1

然后，对于标量使用索引：

print (df2.loc['A_False', 'B_False'])
1

另一种解决方案是由具有和的列名称使用：

要获取A或B为空的行，可以执行以下操作：

bool_df = df.isnull()
df[bool_df['A'] ^ bool_df['B']].shape[0]

要获取两个都为空值的行，请执行以下操作：

df[bool_df['A'] & bool_df['B']].shape[0]

1）数据的输出应该是什么？2）注意，您只想排除A和B都为NaN的行？^是的，我的数据框中没有这种类型的行

df = pd.DataFrame({'A': [1, np.nan,2,3, np.nan,4, np.nan], 
                   'B': [np.nan, 1,np.nan,2, 3, np.nan, np.nan]})

s = df.isna().dot(df.columns).replace({'':'no match'}).value_counts()
print (s)

B           3
A           2
no match    1
AB          1
dtype: int64

bool_df = df.isnull()
df[bool_df['A'] ^ bool_df['B']].shape[0]

df[bool_df['A'] & bool_df['B']].shape[0]