Merge 关于熊猫合并的困惑

Merge 关于熊猫合并的困惑,merge,pandas,Merge,Pandas,我正在尝试合并两个不带索引的数据帧: In [127]: df1 Out[127]: value1 date id value2 group 0 -0.2284 2012-04-01 a -0.067469 group d 1 -0.4875 2012-04-01 b -0.021274 group d 2 0.1139 2012-04-01 c -0.015978 group d 3 0.3191 2012-04-01 d 0.02

我正在尝试合并两个不带索引的数据帧:

In [127]: df1
Out[127]: 
   value1        date id    value2    group
0 -0.2284  2012-04-01  a -0.067469  group d
1 -0.4875  2012-04-01  b -0.021274  group d
2  0.1139  2012-04-01  c -0.015978  group d
3  0.3191  2012-04-01  d  0.022634  group d
4 -0.0077  2012-04-01  e  0.000000  group d

In [128]: df2
Out[128]: 
             date id      value2    group
23044  2012-04-01  a -0.06701001  group c
23045  2012-04-01  b    -0.02128  group c
23046  2012-04-01  c           0  group c
23047  2012-04-01  d           0  group c
23048  2012-04-01  e           0  group c

In [129]: pd.merge(df1, df2, how = 'outer', on = ['date', 'id', 'value2', 'group'])
Out[129]: 
   value1        date id    value2    group
0 -0.2284  2012-04-01  a -0.067469  group d
1 -0.4875  2012-04-01  b -0.021274  group d
2  0.1139  2012-04-01  c -0.015978  group d
3  0.3191  2012-04-01  d  0.022634  group d
4 -0.0077  2012-04-01  e  0.000000  group d
5     NaN  2012-04-01  a -0.067010  group c
6     NaN  2012-04-01  b -0.021280  group c
7     NaN  2012-04-01  c  0.000000  group c
8     NaN  2012-04-01  d  0.000000  group c
9     NaN  2012-04-01  e  0.000000  group c

这几乎是期望的输出,除了我希望c组的值1的NAN根据日期和id由d组的值1填充。实现这一点的正确方法是什么?

我认为这不可避免地是一个两步过程

要“填写”value1,您需要使用相同的(日期、id)关联任何行和所有行,而不考虑组或值

In [5]: df3 = df2.set_index(['date', 'id']).join(
  ....:     df1.set_index(['date', 'id'])['value1']).reset_index()
为了得到最终结果,您将按所有属性列出区分行,而不再将组和值集中在一起

In [6]: pd.merge(df1, df3, how = 'outer', 
  ....:     on = ['date', 'id', 'value1', 'value2', 'group'])
Out[6]: 
   value1        date id    value2    group
0 -0.2284  2012-04-01  a -0.067469  group_d
1 -0.4875  2012-04-01  b -0.021274  group_d
2  0.1139  2012-04-01  c -0.015978  group_d
3  0.3191  2012-04-01  d  0.022634  group_d
4 -0.0077  2012-04-01  e  0.000000  group_d
5 -0.2284  2012-04-01  a -0.067010  group_c
6 -0.4875  2012-04-01  b -0.021280  group_c
7  0.1139  2012-04-01  c  0.000000  group_c
8  0.3191  2012-04-01  d  0.000000  group_c
9 -0.0077  2012-04-01  e  0.000000  group_c