Python 熊猫:如果字段中的值为None/null/NaN,请将字段名称附加到新字段中
我对如何解决一个特殊问题感到困惑。基本上,我希望做到以下几点: 使用pandas,我希望遍历行,如果字段中的值为None/NaN,则将字段名称附加到新字段中,如下所示Python 熊猫:如果字段中的值为None/null/NaN,请将字段名称附加到新字段中,python,pandas,null,Python,Pandas,Null,我对如何解决一个特殊问题感到困惑。基本上,我希望做到以下几点: 使用pandas,我希望遍历行,如果字段中的值为None/NaN,则将字段名称附加到新字段中,如下所示 +----+--------+----------+--------+--------+--------+---------------------------------+ | ID | Animal | Building | Letter | Fruit | Number | NullFields
+----+--------+----------+--------+--------+--------+---------------------------------+
| ID | Animal | Building | Letter | Fruit | Number | NullFields |
+----+--------+----------+--------+--------+--------+---------------------------------+
| 1 | Dog | House | C | null | 4 | Fruit |
| 2 | null | House | null | Apple | null | Animal, Letter, Number |
| 3 | Cat | null | B | Orange | null | Building, Number |
| 4 | null | null | null | null | 6 | Animal, Building, Letter, Fruit |
| 5 | Snake | null | A | null | 7 | Building, Fruit |
+----+--------+----------+--------+--------+--------+---------------------------------+
为了便于阅读,我在上面输入了“null”。我知道None/NaN不一样,但我处理的数据似乎两者都有。如果我必须运行fillna
,那很好
我不认为np.where
在这里起作用,除非我遗漏了什么。我不知道我是否需要使用iterrows
,或者什么
任何提示/指导都将不胜感激 这样做可以:
# if ID is index, then just `df` instead of `df.iloc[...]
s = df.iloc[:,1:].isna()
df['NullFields'] = (s @ (s.columns + (', '))).str.strip(', ')
输出:
ID Animal Building Letter Fruit Number NullFields
0 1 Dog House C NaN 4.0 Fruit
1 2 NaN House NaN Apple NaN Animal, Letter, Number
2 3 Cat NaN B Orange NaN Building, Number
3 4 NaN NaN NaN NaN 6.0 Animal, Building, Letter, Fruit
4 5 Snake NaN A NaN 7.0 Building, Fruit
首先需要将
NaN
字段设置为true,以测量它们是否为null,然后我们可以使用isnull
,然后使用.dot
df['NullableFields'] = df.replace("null", np.nan).isnull().dot(df.columns)
print(df)
ID Animal Building Letter Fruit Number \
0 1 Dog House C null 4
1 2 null House null Apple null
2 3 Cat null B Orange null
3 4 null null null null 6
4 5 Snake null A null 7
NullFields NullableFields
0 Fruit Fruit
1 Animal, Letter, Number Animal Letter Number
2 Building, Number Building Number
3 Animal, Building, Letter, Fruit Animal Building Letter Fruit
4 Building, Fruit Building Fruit
这些值是
None/null/NaN
值还是相应的字符串值?无。所以只要值基本上为空。因此,当你看到上面的“null”时,它意味着什么,而不是字符串的值。我想到的一件事(但会很混乱)是这样做的:df['Test1']=np.where(((df['Animal'].isnull())'Name of Field',None)
\ndf['Test2']=np.where((df['Building'.isnull())'Name of Field',',None)
为每一列添加结果……这似乎不是一种非常有效的方法,实际上我有20个字段要查看。谢谢!知道解决问题的方法不止一种总是好的!