Python 熊猫合并而不复制列_Python_Pandas_Dataframe_Merge

Python 熊猫合并而不复制列

python pandas dataframe merge

Python 熊猫合并而不复制列,python,pandas,dataframe,merge,Python,Pandas,Dataframe,Merge,我需要在不创建重复列的情况下合并两个数据帧。第一个数据帧（dfa）缺少值。第二个数据帧（dfb）具有唯一的值。这与Excel中的vlookup相同 dfa如下所示： postcode lat lon ...plus 32 more columns M20 2.3 0.2 LS1 NaN NaN LS1 NaN NaN LS2 NaN NaN M21 2.4 0.3 postcode lat lon LS1

我需要在不创建重复列的情况下合并两个数据帧。第一个数据帧（dfa）缺少值。第二个数据帧（dfb）具有唯一的值。这与Excel中的vlookup相同

dfa如下所示：

postcode  lat  lon ...plus 32 more columns
M20       2.3  0.2
LS1       NaN  NaN
LS1       NaN  NaN
LS2       NaN  NaN
M21       2.4  0.3

postcode  lat  lon 
LS1       1.4  0.1
LS2       1.5  0.2

dfb仅包含唯一的邮政编码和值，其中lat和lon在dfa中为NaN。看起来是这样的：

postcode  lat  lon ...plus 32 more columns
M20       2.3  0.2
LS1       NaN  NaN
LS1       NaN  NaN
LS2       NaN  NaN
M21       2.4  0.3

postcode  lat  lon 
LS1       1.4  0.1
LS2       1.5  0.2

我想要的输出是：

postcode  lat  lon ...plus 32 more columns
M20       2.3  0.2
LS1       1.4  0.1
LS1       1.4  0.1
LS2       1.5  0.2
M21       2.4  0.3

我尝试过使用pd.merge，如下所示：

outputdf = pd.merge(dfa, dfb, on='Postcode', how='left')

这将导致创建重复的列：

postcode  lat_x  lon_x  lat_y  lat_x ...plus 32 more columns
M20       2.3    0.2    NaN    NaN
LS1       NaN    NaN    1.4    0.1
LS1       NaN    NaN    1.4    0.1
LS2       NaN    NaN    1.5    0.2
M21       2.4    0.3    NaN    NaN

从中，我尝试使用：

output = dfa
for df in [dfa, dfb]:
    ouput.update(df.set_index('Postcode'))

但收到“ValueError:无法从重复轴重新编制索引”

同样，从上述答案来看，这不起作用：

没有重复的列，但“Lat”和“Lon”中的值仍然为空

是否有一种方法可以在不创建重复列的情况下合并“Postcode”；使用熊猫有效地执行VLOOKUP

在两个数据帧中按

邮政编码

使用索引，然后在必要时添加相同顺序的列，如原始

df1

：

print (df1)
  postcode  lat  lon  plus  32  more  columns
0      M20  2.3  0.2   NaN NaN   NaN      NaN
1      LS1  NaN  NaN   NaN NaN   NaN      NaN
2      LS1  NaN  NaN   NaN NaN   NaN      NaN
3      LS2  NaN  NaN   NaN NaN   NaN      NaN
4      M21  2.4  0.3   NaN NaN   NaN      NaN

df1 = df1.set_index('postcode')
df2 = df2.set_index('postcode')

df3 = df1.combine_first(df2).reindex(df1.columns, axis=1)
print (df3)
          lat  lon  plus  32  more  columns
postcode                                   
LS1       1.4  0.1   NaN NaN   NaN      NaN
LS1       1.4  0.1   NaN NaN   NaN      NaN
LS2       1.5  0.2   NaN NaN   NaN      NaN
M20       2.3  0.2   NaN NaN   NaN      NaN
M21       2.4  0.3   NaN NaN   NaN      NaN

似乎是最好的解决办法

如果您需要一行代码，但不想更改输入数据帧：

 df1.set_index('postcode').combine_first(df2.set_index('postcode'))

如果需要保留df1中的索引：

df1.reset_index().set_index('postcode').combine_first(df2.set_index('postcode')).reset_index().set_index('index').sort_index()

不是为了优雅，而是为了工作。

dfa.fillna（dfb）

？这很有效。我以为我找到了另一个解决方案（见编辑后的问题），但我错了。如果没有其他报价，我很快就会接受。