Python 沿具有非唯一索引的列连接两个数据帧_Python_Pandas_Dataframe_Join_Merge

Python 沿具有非唯一索引的列连接两个数据帧

python pandas dataframe join merge

Python 沿具有非唯一索引的列连接两个数据帧,python,pandas,dataframe,join,merge,Python,Pandas,Dataframe,Join,Merge,我有两个数据帧，我想沿列连接它们。索引不是唯一的： df1 = pd.DataFrame({'A': ['0', '1', '2', '2'],'B': ['B0', 'B1', 'B2', 'B3'],'C': ['C0', 'C1', 'C2', 'C3']}): A B C 0 0 B0 C0 1 1 B1 C1 2 2 B2 C2 3 2 B3 C3 df2 = pd.DataFrame({'A': ['0', '2', '3'],'E': ['

我有两个数据帧，我想沿列连接它们。索引不是唯一的：

df1 = pd.DataFrame({'A': ['0', '1', '2', '2'],'B': ['B0', 'B1', 'B2', 'B3'],'C': ['C0', 'C1', 'C2', 'C3']}):
    A   B   C
0  0  B0  C0
1  1  B1  C1
2  2  B2  C2
3  2  B3  C3

df2 = pd.DataFrame({'A': ['0', '2', '3'],'E': ['E0', 'E1', 'E2']},index=[0, 2, 3])
    A   E
0  0  E0
1  2  E1
2  3  E2

A应该是我的索引。我想要的是：

    A   B   C   E
0  0  B0  C0    E0
1  1  B1  C1    NAN
2  2  B2  C2    E1
3  2  B3  C3    E1

这

pd.concat（[df1，df2]，1）

给了我一个错误：

Reindexing only valid with uniquely valued Index objects

沿柱轴连接到

输出：

    A   B   C    D    E
0  A0  B0  C0   D0   E0
1  A1  B1  C1  NaN  NaN
2  A2  B2  C2   D1   E1
2  A3  B3  C3   D1   E1

沿柱轴连接到

输出：

    A   B   C    D    E
0  A0  B0  C0   D0   E0
1  A1  B1  C1  NaN  NaN
2  A2  B2  C2   D1   E1
2  A3  B3  C3   D1   E1

首先使用

combine\u

df1.combine_first(df2).dropna(subset=['A'],axis=0)
Out[320]: 
    A   B   C    D    E
0  A0  B0  C0   D0   E0
1  A1  B1  C1  NaN  NaN
2  A2  B2  C2   D1   E1
2  A3  B3  C3   D1   E1

df1.combine_first(df2.set_index('A'))
Out[338]: 
   A   B   C    E
0  0  B0  C0   E0
1  1  B1  C1  NaN
2  2  B2  C2   E1
3  2  B3  C3   E2

编辑后：

首先使用

combine\u

df1.combine_first(df2).dropna(subset=['A'],axis=0)
Out[320]: 
    A   B   C    D    E
0  A0  B0  C0   D0   E0
1  A1  B1  C1  NaN  NaN
2  A2  B2  C2   D1   E1
2  A3  B3  C3   D1   E1

df1.combine_first(df2.set_index('A'))
Out[338]: 
   A   B   C    E
0  0  B0  C0   E0
1  1  B1  C1  NaN
2  2  B2  C2   E1
3  2  B3  C3   E2

或

首先使用

combine\u

df1.combine_first(df2).dropna(subset=['A'],axis=0)
Out[320]: 
    A   B   C    D    E
0  A0  B0  C0   D0   E0
1  A1  B1  C1  NaN  NaN
2  A2  B2  C2   D1   E1
2  A3  B3  C3   D1   E1

df1.combine_first(df2.set_index('A'))
Out[338]: 
   A   B   C    E
0  0  B0  C0   E0
1  1  B1  C1  NaN
2  2  B2  C2   E1
3  2  B3  C3   E2

编辑后：

首先使用

combine\u

df1.combine_first(df2).dropna(subset=['A'],axis=0)
Out[320]: 
    A   B   C    D    E
0  A0  B0  C0   D0   E0
1  A1  B1  C1  NaN  NaN
2  A2  B2  C2   D1   E1
2  A3  B3  C3   D1   E1

df1.combine_first(df2.set_index('A'))
Out[338]: 
   A   B   C    E
0  0  B0  C0   E0
1  1  B1  C1  NaN
2  2  B2  C2   E1
3  2  B3  C3   E2

或

也许你在找一个左撇子

pd.concat（[df1，df2]，1）

错误：仅对唯一值索引对象的存储答案重新索引有效…我会说，不要像这样更改您的问题…

pd.concat（[df1，df2]，1）

错误：仅对唯一值索引对象的存储答案重新索引有效…我会说，不要这样更改您的问题…错误：仅对唯一值索引对象重新索引有效错误：仅对唯一值索引对象重新索引有效