Python 使用DateTime格式的重复索引连接两个数据帧_Python_Pandas_Dataframe_Concatenation_Concat

Python 使用DateTime格式的重复索引连接两个数据帧

python pandas dataframe

Python 使用DateTime格式的重复索引连接两个数据帧,python,pandas,dataframe,concatenation,concat,Python,Pandas,Dataframe,Concatenation,Concat,我试图在包含重复索引的两个数据帧上使用pandas concat 当我尝试连接我的两个数据帧时，我得到以下错误传递值的形状为（12180054），指数为（121000）为了更好地理解这个问题，我创建了两个数据帧： df1 = pd.DataFrame([{'a':"2018-01-01",'b':2},{'a':"2018-01-01",'b':3},{'a':"2018-01-02",'b':4}], columns = ['a','b']).set_

我试图在包含重复索引的两个数据帧上使用pandas concat

当我尝试连接我的两个数据帧时，我得到以下错误

传递值的形状为（12180054），指数为（121000）

为了更好地理解这个问题，我创建了两个数据帧：

df1 = pd.DataFrame([{'a':"2018-01-01",'b':2},{'a':"2018-01-01",'b':3},{'a':"2018-01-02",'b':4}],
                   columns = ['a','b']).set_index('a')
df1.index = pd.to_datetime(df1.index)

这看起来像：

            b
a   
2018-01-01  2
2018-01-01  3
2018-01-02  4

            c
a   
2018-01-01  5
2018-01-02  6

            b
a   
2018-01-01  2
2018-01-01  3
2018-01-03  4

及

这看起来像：

            b
a   
2018-01-01  2
2018-01-01  3
2018-01-02  4

            c
a   
2018-01-01  5
2018-01-02  6

            b
a   
2018-01-01  2
2018-01-01  3
2018-01-03  4

这与我的原始数据帧也有类似的方面。索引是重复的，并且是日期时间格式

但是，concat（axis=1）可以很好地创建以下数据帧

            b   c
a       
2018-01-01  2   5
2018-01-01  3   5
2018-01-02  4   6

（这正是我所期望的）

但是，如果我使用：

df3 = pd.DataFrame([{'a':"2018-01-01",'b':2},{'a':"2018-01-01",'b':3},{'a':"2018-01-03",'b':4}],
                   columns = ['a','b']).set_index('a')
df3.index = pd.to_datetime(df3.index)

这看起来像：

            b
a   
2018-01-01  2
2018-01-01  3
2018-01-02  4

            c
a   
2018-01-01  5
2018-01-02  6

            b
a   
2018-01-01  2
2018-01-01  3
2018-01-03  4

它返回的不是df1

Shape of passed values is (2, 6), indices imply (2, 4)

两者之间的唯一区别是df1的最终日期为2018-01-02，df3的最终日期为2018-01-03

逻辑上（至少对我来说）它应该返回以下内容：

            b   c
a       
2018-01-01  2   5
2018-01-01  3   5
2018-01-02  Nan 6
2018-01-03  4   Nan

我不明白它如何能够正确地处理一个而不是另一个，因为如果它不能处理重复的索引，那么它应该在两个索引上都失败

基本上是同一个问题，但是所有的anaswers都说问题在于重复索引，然而这不是唯一的原因，因为concat确实使用重复索引

我想真正了解出了什么问题，以及回避这个问题的方法

非常感谢

您需要进行外部连接：

df3.join（df2，how='outer'）

ChuHo回答怎么做。我试图回答为什么它不起作用：

当两边都有重复行和唯一行时，问题似乎就出现了

import pandas as pd

frame_a = pd.DataFrame({'a': ['a1']}, index = [1])
frame_b = pd.DataFrame({'b': ['b1', 'b2', 'b2']}, index = [1,2,2])
frame_c = pd.DataFrame({'c': ['c3', 'c3']}, index = [3,3])

pd.concat([frame_a,frame_b], axis=1)  # works
     a   b
1   a1  b1
2  NaN  b2
2  NaN  b2
pd.concat([frame_a,frame_c], axis=1)  # fails
ValueError: Shape of passed values is (5, 2), indices imply (3, 2)