Python 在数据帧中连接包含NAN的列表列
我有一个带有两列的数组,两列要么有列表,要么有NaN值两列中都没有包含NaN的行。我想创建第三列,以以下方式合并其他两列的值:-Python 在数据帧中连接包含NAN的列表列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个带有两列的数组,两列要么有列表,要么有NaN值两列中都没有包含NaN的行。我想创建第三列,以以下方式合并其他两列的值:- if row df.a is NaN -> df.c = df.b if row df.b is Nan -> df.c = df.a else df.c = df.a + df.b 输入:- df a b 0
if row df.a is NaN -> df.c = df.b
if row df.b is Nan -> df.c = df.a
else df.c = df.a + df.b
输入:-
df
a b
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 NaN [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 NaN [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
输出:
df.c
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
我尝试将此嵌套条件用于apply
df['c'] = df.apply(lambda x: x.a if x.b is float else (x.b if x.a is float else (x['a'] + x['b'])), axis = 1)
但是他给了我这个错误:
TypeError:(“只能将列表(而不是“浮点”)连接到列表”,u“出现在索引0处”)
我正在使用(而且它确实有效)
因为这是我发现的唯一将列表与NaN值分开的方法。您可以先将NaN
s替换为空列表:
df = pd.DataFrame({'a': [[0, 1, 2], np.nan, [0, 1, 2]],
'b':[np.nan,[0, 1, 2],[ 5, 6, 7, 8, 9]]})
print (df)
s = pd.Series([[]], index=df.index)
df['c'] = df['a'].fillna(s) + df['b'].fillna(s)
print (df)
a b c
0 [0, 1, 2] NaN [0, 1, 2]
1 NaN [0, 1, 2] [0, 1, 2]
2 [0, 1, 2] [5, 6, 7, 8, 9] [0, 1, 2, 5, 6, 7, 8, 9]
您可以将NaN
s转换为list,然后应用np.sum
:
In [718]: df['c'] = df[['a', 'b']].applymap(lambda x: [] if x != x else x).apply(np.sum, axis=1); df['c']
Out[718]:
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, ...
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, ...
9 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Name: c, dtype: object
这适用于具有列表/NaN内容的任意数量的列。当您使用pd.DataFrame.stack时,默认情况下会删除空值。然后,我们可以按索引的第一级进行分组,并将列表与sum
df.stack().groupby(level=0).sum()
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
dtype: object
然后,我们可以使用assign
df.assign(c=df.stack().groupby(level=0).sum())
或者将其添加到新的列中
df['c'] = df.stack().groupby(level=0).sum()
df['c'] = df.stack().groupby(level=0).sum()