Python 与重复索引合并-行数大于预期_Python_Pandas

Python 与重复索引合并-行数大于预期

python pandas

Python 与重复索引合并-行数大于预期,python,pandas,Python,Pandas,我有两个带有重复索引的数据帧 df1 = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'], index=['I1', 'I1' ,'I1', 'I2', 'I2']) df2 = pd.DataFrame(np.random.randn(4, 3), columns=['D', 'E', 'F'], index=['I1', 'I1', 'I1', 'I2']) pd.merge(df1, df2, how='left',

我有两个带有重复索引的数据帧

df1 = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'], index=['I1', 'I1' ,'I1', 'I2', 'I2'])
df2 = pd.DataFrame(np.random.randn(4, 3), columns=['D', 'E', 'F'], index=['I1', 'I1', 'I1', 'I2'])

pd.merge(df1, df2, how='left', left_index=True, right_index=True)

熊猫似乎没有意识到两个指数具有相同的值 . 我希望数据帧必须

5

行和列

abcdef

。最后一行是

def

大概是这样的：

        A         B         C         D         E         F
I1  0.121993  0.208368 -0.056375  0.492218 -0.915034  1.667015
I1  0.121993  0.208368 -0.056375 -0.055575 -0.207215 -0.351027
I1  0.121993  0.208368 -0.056375  1.128143  1.371022  0.810542
I2 -0.817558  1.599293 -0.342841 -0.831796 -0.118316 -0.138027
I2 -0.817558  1.599293 -0.342841  NaN       NaN       NaN

相反，我得到的是：

          A         B         C         D         E         F
I1  0.121993  0.208368 -0.056375  0.492218 -0.915034  1.667015
I1  0.121993  0.208368 -0.056375 -0.055575 -0.207215 -0.351027
I1  0.121993  0.208368 -0.056375  1.128143  1.371022  0.810542
I1  0.403085  0.532958  0.856544  0.492218 -0.915034  1.667015
I1  0.403085  0.532958  0.856544 -0.055575 -0.207215 -0.351027
I1  0.403085  0.532958  0.856544  1.128143  1.371022  0.810542
I1  0.094214 -0.527932 -1.368606  0.492218 -0.915034  1.667015
I1  0.094214 -0.527932 -1.368606 -0.055575 -0.207215 -0.351027
I1  0.094214 -0.527932 -1.368606  1.128143  1.371022  0.810542
I2  0.378565  0.331995  0.167682 -0.831796 -0.118316 -0.138027
I2  0.378565  0.331995  0.167682 -0.561473 -0.898151 -0.217683
I2 -0.817558  1.599293 -0.342841 -0.831796 -0.118316 -0.138027
I2 -0.817558  1.599293 -0.342841 -0.561473 -0.898151 -0.217683

编辑：我无法删除结果的重复数据，因为我不想丢失原始数据帧中的重复行

用于连接同一索引：

pd.concat([df2,df1],axis=1)

           A         B         C         D         E         F
I1  0.112906 -1.080809  0.857712 -0.849395  0.015475  0.619177
I1 -0.380070  1.389495  1.372172 -0.472603 -0.593138 -0.594146
I1 -0.258423  1.402873 -0.923191 -2.138440  0.099878  0.148920
I2 -1.618755 -0.459908 -0.803290 -0.267760  0.275084  0.810870
I2 -0.033210  0.523840 -1.028478 -1.300269 -1.516137  0.373555

编辑：

pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)],axis=1).set_index(df1.index)

           A         B         C         D         E         F
I1  1.925637  0.082031  0.483414 -0.189940  0.763408 -0.346046
I1 -0.676511  0.482327  1.648381  2.635290 -0.080474  0.558633
I1  0.180004 -0.190909  0.821891 -1.010627  0.774914  0.988356
I2 -0.011089  0.364400 -0.207062 -1.335626  0.036884  1.628115
I2 -1.314910  0.294986  0.334418       NaN       NaN       NaN

您重复了相同的索引

I1

和

I2

，是否正确？它看起来是数据帧中所有可能性的组合。

pd.concat（[df2，df1]，axis=1）

？是的，这是正确的。结果似乎包含每个索引计数的平方。我想保留与连接相同的索引dataframes@anky_91

concat

是一个

内部连接

。理想情况下，我需要一个

左连接

这在这种特殊情况下有效，但是如果

I2

的

count

为

则不起作用。在这种情况下，我希望结果在最后一行中包含所有

nan

。因此，尝试的

左连接

@Ajit很好，如果您能提供一些示例和预期输出，我将更新我的答案。：）