Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/364.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas-将缺少的列作为NaN连接_Python_Pandas_Dataframe - Fatal编程技术网

Python Pandas-将缺少的列作为NaN连接

Python Pandas-将缺少的列作为NaN连接,python,pandas,dataframe,Python,Pandas,Dataframe,想象两个数据帧: X = pd.DataFrame([[1,2],[3,4],[5,6]], columns=["a", "b"]) Y = pd.DataFrame([10,20,30], columns=["a"]) >>> X a b 0 1 2 1 3 4 2 5 6 >>> Y a 0 10 1 20 2 30 总的来说,我希望我的最终输出如下: a_X b_X a_Y b_Y sum_a sum_b 0

想象两个数据帧:

X = pd.DataFrame([[1,2],[3,4],[5,6]], columns=["a", "b"])
Y = pd.DataFrame([10,20,30], columns=["a"])

>>> X
   a  b
0  1  2
1  3  4
2  5  6
>>> Y
   a
0  10
1  20
2  30
总的来说,我希望我的最终输出如下:

   a_X  b_X  a_Y b_Y sum_a sum_b
0    1  2    10  NaN  11      2
1    3  4    20  NaN  23      4
2    5  6    30  NaN  35      6
我试图通过以下方式实现:

merged = X.join(Y, lsuffix="_X", rsuffix="_Y")
merged['sum_a'] = merged['a_X'] + merged['a_Y'] # works
merged['sum_b'] = merged['b_X'] + merged['b_Y'] # doesn't work
显然,sum_b列将失败,因为Y集中没有b列。它可能在那里,但它不一定在那里,我的数据集没有任何保证。看起来我无法使用内置连接在那里添加“NaN”列。

您可以执行以下操作:

import numpy as np

Y['b'] = np.nan
merged = X.join(Y, lsuffix="_X", rsuffix="_Y")
merged['sum_a'] = merged['a_X'] + merged['a_Y']
merged['sum_b'] = merged['b_X'] + merged.fillna(0)['b_Y']

#>>> merged
#   a_X  b_X  a_Y  b_Y  sum_a  sum_b
#0    1    2   10  NaN     11    2.0
#1    3    4   20  NaN     23    4.0
#2    5    6   30  NaN     35    6.0

与pd.concat连接

k = ['X', 'Y']

df = pd.concat([X, Y], keys=k, axis=1)
df

   X      Y
   a  b   a
0  1  2  10
1  3  4  20
2  5  6  30
生成多索引并使用它重新索引-

idx = pd.MultiIndex.from_product([k, df.columns.levels[1].unique()])
df = df.reindex(columns=idx)
df

   X      Y    
   a  b   a   b
0  1  2  10 NaN
1  3  4  20 NaN
2  5  6  30 NaN
重新设置列名-

df.columns = df.columns.map('_'.join)
df

   X_a  X_b  Y_a  Y_b
0    1    2   10  NaN
1    3    4   20  NaN
2    5    6   30  NaN
现在,您可以按后缀分组并查找和-

v = df.groupby(by=lambda x: x.split('_')[1], axis=1).sum().add_prefix('sum_')
v

   sum_a  sum_b
0   11.0    2.0
1   23.0    4.0
2   35.0    6.0
将此文件与原始文件连接:

pd.concat([df, v], 1)

   X_a  X_b  Y_a  Y_b  sum_a  sum_b
0    1    2   10  NaN   11.0    2.0
1    3    4   20  NaN   23.0    4.0
2    5    6   30  NaN   35.0    6.0

一个更接近你正在做的事情的替代方案。由于
Y
不必具有与
X
相同的列,您可以对
Y
使用
reindex
,然后使用
fill\u value
选项执行操作:

Y = Y.reindex(columns=X.columns)
>>> Y
#    a    b
#0  10  NaN
#1  20  NaN  
#2  30  NaN

merged = X.join(Y, lsuffix="_X", rsuffix="_Y")
merged['sum_a'] = merged['a_X'].add(merged['a_Y'], fill_value=0)
merged['sum_b'] = merged['b_X'].add(merged['b_Y'], fill_value=0)
Y = Y.reindex(columns=X.columns)
>>> Y
#    a    b
#0  10  NaN
#1  20  NaN  
#2  30  NaN

merged = X.join(Y, lsuffix="_X", rsuffix="_Y")
merged['sum_a'] = merged['a_X'].add(merged['a_Y'], fill_value=0)
merged['sum_b'] = merged['b_X'].add(merged['b_Y'], fill_value=0)