Python 熊猫：如果字段中的值为None/null/NaN，请将字段名称附加到新字段中_Python_Pandas_Null

Python 熊猫：如果字段中的值为None/null/NaN，请将字段名称附加到新字段中

python pandas

Python 熊猫：如果字段中的值为None/null/NaN，请将字段名称附加到新字段中,python,pandas,null,Python,Pandas,Null,我对如何解决一个特殊问题感到困惑。基本上，我希望做到以下几点：使用pandas，我希望遍历行，如果字段中的值为None/NaN，则将字段名称附加到新字段中，如下所示 +----+--------+----------+--------+--------+--------+---------------------------------+ | ID | Animal | Building | Letter | Fruit | Number | NullFields

我对如何解决一个特殊问题感到困惑。基本上，我希望做到以下几点：

使用pandas，我希望遍历行，如果字段中的值为None/NaN，则将字段名称附加到新字段中，如下所示

+----+--------+----------+--------+--------+--------+---------------------------------+
| ID | Animal | Building | Letter | Fruit  | Number |           NullFields            |
+----+--------+----------+--------+--------+--------+---------------------------------+
|  1 | Dog    | House    | C      | null   | 4      | Fruit                           |
|  2 | null   | House    | null   | Apple  | null   | Animal, Letter, Number          |
|  3 | Cat    | null     | B      | Orange | null   | Building, Number                |
|  4 | null   | null     | null   | null   | 6      | Animal, Building, Letter, Fruit |
|  5 | Snake  | null     | A      | null   | 7      | Building, Fruit                 |
+----+--------+----------+--------+--------+--------+---------------------------------+

为了便于阅读，我在上面输入了“null”。我知道None/NaN不一样，但我处理的数据似乎两者都有。如果我必须运行

fillna

，那很好

我不认为

np.where

在这里起作用，除非我遗漏了什么。我不知道我是否需要使用

iterrows

，或者什么

任何提示/指导都将不胜感激

这样做可以：

# if ID is index, then just `df` instead of `df.iloc[...]
s = df.iloc[:,1:].isna()
df['NullFields'] = (s @ (s.columns + (', '))).str.strip(', ')

输出：

   ID Animal Building Letter   Fruit  Number                       NullFields
0   1    Dog    House      C     NaN     4.0                            Fruit
1   2    NaN    House    NaN   Apple     NaN           Animal, Letter, Number
2   3    Cat      NaN      B  Orange     NaN                 Building, Number
3   4    NaN      NaN    NaN     NaN     6.0  Animal, Building, Letter, Fruit
4   5  Snake      NaN      A     NaN     7.0                  Building, Fruit

首先需要将

NaN

字段设置为true，以测量它们是否为null，然后我们可以使用

isnull

，然后使用

.dot

df['NullableFields'] = df.replace("null", np.nan).isnull().dot(df.columns)

print(df)

    ID   Animal   Building   Letter   Fruit    Number   \
0     1      Dog      House        C     null        4   
1     2     null      House     null    Apple     null   
2     3      Cat       null        B   Orange     null   
3     4     null       null     null     null        6   
4     5    Snake       null        A     null        7   

             NullFields                                 NullableFields  
0                            Fruit                             Fruit    
1           Animal, Letter, Number             Animal  Letter  Number   
2                 Building, Number                   Building  Number   
3  Animal, Building, Letter, Fruit   Animal  Building  Letter  Fruit    
4                  Building, Fruit                   Building  Fruit

这些值是

None/null/NaN

值还是相应的字符串值？无。所以只要值基本上为空。因此，当你看到上面的“null”时，它意味着什么，而不是字符串的值。我想到的一件事（但会很混乱）是这样做的：

df['Test1']=np.where（（（df['Animal'].isnull（））'Name of Field'，None）

df['Test2']=np.where（（df['Building'.isnull（））'Name of Field'，'，None）

为每一列添加结果……这似乎不是一种非常有效的方法，实际上我有20个字段要查看。谢谢！知道解决问题的方法不止一种总是好的！