Python 基于另一个特定列显示特定列缺少的值
这是我的问题 假设数据帧上有两列,如下所示:Python 基于另一个特定列显示特定列缺少的值,python,pandas,dataframe,multiple-columns,nan,Python,Pandas,Dataframe,Multiple Columns,Nan,这是我的问题 假设数据帧上有两列,如下所示: Type | Killed _______ |________ Dog 1 Dog nan Dog nan Cat 4 Cat nan Cow 1 Cow nan Type | Sum(isnull) Dog 2 Cat 1 Cow 1 我想根据类型显示Killed中所有缺失的值,并对它们进行计数 我的
Type | Killed
_______ |________
Dog 1
Dog nan
Dog nan
Cat 4
Cat nan
Cow 1
Cow nan
Type | Sum(isnull)
Dog 2
Cat 1
Cow 1
我想根据类型显示Killed中所有缺失的值,并对它们进行计数
我的期望结果如下所示:
Type | Killed
_______ |________
Dog 1
Dog nan
Dog nan
Cat 4
Cat nan
Cow 1
Cow nan
Type | Sum(isnull)
Dog 2
Cat 1
Cow 1
是否仍然可以显示此信息?您可以使用:
或者加起来,它似乎更快:
print (df[df.Killed.isnull()]
.groupby('Type')['Killed']
.size()
.reset_index(name='Sum(isnull)'))
Type Sum(isnull)
0 Cat 1
1 Cow 1
2 Dog 2
计时:
df = pd.concat([df]*1000).reset_index(drop=True)
In [30]: %timeit (df.ix[df.Killed.isnull(), 'Type'].value_counts().reset_index(name='Sum(isnull)'))
100 loops, best of 3: 5.36 ms per loop
In [31]: %timeit (df[df.Killed.isnull()].groupby('Type')['Killed'].size().reset_index(name='Sum(isnull)'))
100 loops, best of 3: 2.02 ms per loop
我可以为您提供
isnull
和notnull
isnull = np.where(df.Killed.isnull(), 'isnull', 'notnull')
df.groupby([df.Type, isnull]).size().unstack()