Python 熊猫合并而不复制列
我需要在不创建重复列的情况下合并两个数据帧。第一个数据帧(dfa)缺少值。第二个数据帧(dfb)具有唯一的值。这与Excel中的vlookup相同 dfa如下所示:Python 熊猫合并而不复制列,python,pandas,dataframe,merge,Python,Pandas,Dataframe,Merge,我需要在不创建重复列的情况下合并两个数据帧。第一个数据帧(dfa)缺少值。第二个数据帧(dfb)具有唯一的值。这与Excel中的vlookup相同 dfa如下所示: postcode lat lon ...plus 32 more columns M20 2.3 0.2 LS1 NaN NaN LS1 NaN NaN LS2 NaN NaN M21 2.4 0.3 postcode lat lon LS1
postcode lat lon ...plus 32 more columns
M20 2.3 0.2
LS1 NaN NaN
LS1 NaN NaN
LS2 NaN NaN
M21 2.4 0.3
postcode lat lon
LS1 1.4 0.1
LS2 1.5 0.2
dfb仅包含唯一的邮政编码和值,其中lat和lon在dfa中为NaN。看起来是这样的:
postcode lat lon ...plus 32 more columns
M20 2.3 0.2
LS1 NaN NaN
LS1 NaN NaN
LS2 NaN NaN
M21 2.4 0.3
postcode lat lon
LS1 1.4 0.1
LS2 1.5 0.2
我想要的输出是:
postcode lat lon ...plus 32 more columns
M20 2.3 0.2
LS1 1.4 0.1
LS1 1.4 0.1
LS2 1.5 0.2
M21 2.4 0.3
我尝试过使用pd.merge,如下所示:
outputdf = pd.merge(dfa, dfb, on='Postcode', how='left')
这将导致创建重复的列:
postcode lat_x lon_x lat_y lat_x ...plus 32 more columns
M20 2.3 0.2 NaN NaN
LS1 NaN NaN 1.4 0.1
LS1 NaN NaN 1.4 0.1
LS2 NaN NaN 1.5 0.2
M21 2.4 0.3 NaN NaN
从中,我尝试使用:
output = dfa
for df in [dfa, dfb]:
ouput.update(df.set_index('Postcode'))
但收到“ValueError:无法从重复轴重新编制索引”
同样,从上述答案来看,这不起作用:
没有重复的列,但“Lat”和“Lon”中的值仍然为空
是否有一种方法可以在不创建重复列的情况下合并“Postcode”;使用熊猫有效地执行VLOOKUP 在两个数据帧中按邮政编码
使用索引,然后在必要时添加相同顺序的列,如原始df1
:
print (df1)
postcode lat lon plus 32 more columns
0 M20 2.3 0.2 NaN NaN NaN NaN
1 LS1 NaN NaN NaN NaN NaN NaN
2 LS1 NaN NaN NaN NaN NaN NaN
3 LS2 NaN NaN NaN NaN NaN NaN
4 M21 2.4 0.3 NaN NaN NaN NaN
df1 = df1.set_index('postcode')
df2 = df2.set_index('postcode')
df3 = df1.combine_first(df2).reindex(df1.columns, axis=1)
print (df3)
lat lon plus 32 more columns
postcode
LS1 1.4 0.1 NaN NaN NaN NaN
LS1 1.4 0.1 NaN NaN NaN NaN
LS2 1.5 0.2 NaN NaN NaN NaN
M20 2.3 0.2 NaN NaN NaN NaN
M21 2.4 0.3 NaN NaN NaN NaN
似乎是最好的解决办法
如果您需要一行代码,但不想更改输入数据帧:
df1.set_index('postcode').combine_first(df2.set_index('postcode'))
如果需要保留df1中的索引:
df1.reset_index().set_index('postcode').combine_first(df2.set_index('postcode')).reset_index().set_index('index').sort_index()
不是为了优雅,而是为了工作。
dfa.fillna(dfb)
?这很有效。我以为我找到了另一个解决方案(见编辑后的问题),但我错了。如果没有其他报价,我很快就会接受。