Python 使用每行中的非空值创建一个新列_Python_Pandas

Python 使用每行中的非空值创建一个新列

python pandas

Python 使用每行中的非空值创建一个新列,python,pandas,Python,Pandas,有一个包含4列的数据框： col1 col2 col3 col4 0 orange NaN NaN NaN 1 NaN tomato NaN NaN 2 NaN NaN apple NaN 3 NaN NaN NaN carrot 4 NaN potato NaN NaN 每行仅包含一个字符串值，该值可能出现在任何列中。该行中的其他列为NaN。我想

有一个包含4列的数据框：

     col1    col2   col3    col4
0  orange     NaN    NaN     NaN
1     NaN  tomato    NaN     NaN
2     NaN     NaN  apple     NaN
3     NaN     NaN    NaN  carrot
4     NaN  potato    NaN     NaN

每行仅包含一个字符串值，该值可能出现在任何列中。该行中的其他列为NaN。我想创建一列，其中包含字符串值：

      col5 
0   orange
1   tomato
2    apple
3   carrot
4   potato

最明显的方法如下所示：

data['col5'] = data.col1.astype(str) + data.col2.astype(str)...

并从输出字符串中删除NaN，但这很混乱，肯定会导致错误

Pandas提供了任何简单的方法吗？

这里有一种方法，使用apply和first\u valid\u索引：

为了有效地获取这些信息，您可以访问numpy：

In [21]: df.values.ravel()[np.arange(0, len(df.index) * len(df.columns), len(df.columns)) + np.argmax(df.notnull().values, axis=1)]
Out[21]: array(['orange', 'tomato', 'apple', 'carrot', 'potato'], dtype=object)

注意：如果您有所有NaN的行，则两者都将失败。您应该过滤掉这些行，例如使用dropna。

跨行映射过滤函数元素应该可以做到这一点

data['new_col'] = list(data.apply(lambda row: filter(lambda elem: not pd.isnull(elem), row)[0]))

假设每列包含一个字符串值，其余为NaN，而非NaN，另一种方法是填充Na，然后使用max：

data['new_col'] = list(data.apply(lambda row: filter(lambda elem: not pd.isnull(elem), row)[0]))

>>> df.fillna('').max(axis=1)
0    orange
1    tomato
2     apple
3    carrot
4    potato
dtype: object