Python 使用重复键合并数据帧_Python_Pandas_Dataframe

Python 使用重复键合并数据帧

python pandas dataframe

Python 使用重复键合并数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据帧df1，其形式如下： df1：我想将其与另一个数据帧df2合并，该数据帧df2有多行，其中列“a”与df1的索引匹配： df2：我尝试过合并，但我得到了很多类似于以下的df1副本： a b c d e 1 x bb cc dd ee 1 x bb cd de ef 1 x bb dd ef ff 2 y ba ff fg fh 2 y ba fg fh ff

我有一个数据帧df1，其形式如下：

df1：

我想将其与另一个数据帧df2合并，该数据帧df2有多行，其中列“a”与df1的索引匹配：

df2：

我尝试过合并，但我得到了很多类似于以下的df1副本：

      a   b   c   d   e
1     x   bb  cc  dd  ee
1     x   bb  cd  de  ef
1     x   bb  dd  ef  ff

2     y   ba  ff  fg  fh
2     y   ba  fg  fh  ff

      a   b   c   d   e
1     x   bb  cc  dd  ee
              cd  de  ef
              dd  ef  ff

2     y   ba  ff  fg  fh
              fg  fh  ff

df_new = pd.merge(df1.reset_index(), df2).set_index(['index', 'a', 'b'])

df_new = pd.merge(df1.reset_index(), df2).set_index('index')

我如何合并它们以实现以下目标：

      a   b   c   d   e
1     x   bb  cc  dd  ee
1     x   bb  cd  de  ef
1     x   bb  dd  ef  ff

2     y   ba  ff  fg  fh
2     y   ba  fg  fh  ff

      a   b   c   d   e
1     x   bb  cc  dd  ee
              cd  de  ef
              dd  ef  ff

2     y   ba  ff  fg  fh
              fg  fh  ff

df_new = pd.merge(df1.reset_index(), df2).set_index(['index', 'a', 'b'])

df_new = pd.merge(df1.reset_index(), df2).set_index('index')

我读过关于堆叠的书，但我真的不想有多个索引。任何帮助都将不胜感激

您可以使用

pd.merge

方法合并这两个数据帧。我假设您想要广播

的值。否则请提供

的填充值

如果希望将列

和

作为

多索引的一部分，并保留df1
的索引编号，请按如下方式合并数据帧：
      a   b   c   d   e
1     x   bb  cc  dd  ee
1     x   bb  cd  de  ef
1     x   bb  dd  ef  ff

2     y   ba  ff  fg  fh
2     y   ba  fg  fh  ff

      a   b   c   d   e
1     x   bb  cc  dd  ee
              cd  de  ef
              dd  ef  ff

2     y   ba  ff  fg  fh
              fg  fh  ff

df_new = pd.merge(df1.reset_index(), df2).set_index(['index', 'a', 'b'])

df_new = pd.merge(df1.reset_index(), df2).set_index('index')

如果要将a
和b
保留为列，请按如下方式合并数据帧：
      a   b   c   d   e
1     x   bb  cc  dd  ee
1     x   bb  cd  de  ef
1     x   bb  dd  ef  ff

2     y   ba  ff  fg  fh
2     y   ba  fg  fh  ff

      a   b   c   d   e
1     x   bb  cc  dd  ee
              cd  de  ef
              dd  ef  ff

2     y   ba  ff  fg  fh
              fg  fh  ff

df_new = pd.merge(df1.reset_index(), df2).set_index(['index', 'a', 'b'])

df_new = pd.merge(df1.reset_index(), df2).set_index('index')

索引编号重要吗？由于你的问题，我才意识到我在描述中确实犯了一个错误。df2[“a”]实际上是指df1的索引。所以你可以说指数很重要。我尝试了你的解决方法，但没有一种对我有效。实际上我不想传播b的值。如果可能的话，我只想要一次。在我的结果中，如果有帮助的话，第2行和第3行应该是索引号为1的行的一部分。