Pandas 在索引和一列上连接3个数据帧_Pandas

Pandas 在索引和一列上连接3个数据帧

pandas

Pandas 在索引和一列上连接3个数据帧,pandas,Pandas,我想在索引和“type”列上连接3个数据帧，其中缺少一些索引值（dfb和dfc的索引不完整，而dfa的索引完整）。当我做concat时，一些列消失了，如下所示。（我希望最终的数据帧具有多索引，这样我就可以按类型提取连接数据帧的部分，df['type2']应该具有排序索引）我尝试了各种参数的concat，但它不起作用 dfa=pd.DataFrame({'type':['type1','type1','type2'],'a':[10,20,30]},index=[1,2,3]) dfb=pd.D

我想在索引和“type”列上连接3个数据帧，其中缺少一些索引值（dfb和dfc的索引不完整，而dfa的索引完整）。当我做concat时，一些列消失了，如下所示。（我希望最终的数据帧具有多索引，这样我就可以按类型提取连接数据帧的部分，df['type2']应该具有排序索引）
我尝试了各种参数的concat，但它不起作用

dfa=pd.DataFrame({'type':['type1','type1','type2'],'a':[10,20,30]},index=[1,2,3]) dfb=pd.DataFrame({'type':['type1','type2'],'b':[11,21]},index=[2,3]) dfc=pd.DataFrame({'type':['type3'],'c':[33]},index=[3]) dfa dfb dfc pd.concat([dfa,dfb,dfc],axis=0,keys=['type']) #wrong. columns b and c disappear!
我想要一个高效的解决方案，因为我有5个数据帧，有2000个“类型”，每个数据帧的索引大小约为10K
期望的：
所需数据帧的示例：

pd.DataFrame({'a':[10,20,30,np.nan],'b':[np.nan,11,21,np.nan],'c': [np.nan,np.nan,np.nan,33],'type':['type1','type1','type2','type3']},index= [1,2,3,3])

问题在于您没有定义足够的键来匹配连接的数据帧的数量
试试这个：

pd.concat([dfa, dfb, dfc], axis=0, keys=['type_a', 'type_b', 'type_c'])
输出：

a b c type type_a 1 10.0 NaN NaN type1 2 20.0 NaN NaN type1 3 30.0 NaN NaN type2 type_b 2 NaN 11.0 NaN type1 3 NaN 21.0 NaN type2 type_c 3 NaN NaN 33.0 type3

a b c type 1 10.0 NaN NaN type1 2 20.0 NaN NaN type1 3 30.0 NaN NaN type2 2 NaN 11.0 NaN type1 3 NaN 21.0 NaN type2 3 NaN NaN 33.0 type3
或者将
键
参数一起保留在外面：

pd.concat([dfa, dfb, dfc], axis=0)
输出：

a b c type type_a 1 10.0 NaN NaN type1 2 20.0 NaN NaN type1 3 30.0 NaN NaN type2 type_b 2 NaN 11.0 NaN type1 3 NaN 21.0 NaN type2 type_c 3 NaN NaN 33.0 type3

a b c type 1 10.0 NaN NaN type1 2 20.0 NaN NaN type1 3 30.0 NaN NaN type2 2 NaN 11.0 NaN type1 3 NaN 21.0 NaN type2 3 NaN NaN 33.0 type3
创建df后：

dfa=pd.DataFrame({'type':['type1','type1','type2'],'a':[10,20,30]},index=[1,2,3]) dfb=pd.DataFrame({'type':['type1','type2'],'b':[11,21]},index=[2,3]) dfc=pd.DataFrame({'type':['type3'],'c':[33]},index=[3])
您可以像这样使用
merge
和
reset\u index
：

dfs = [dfa, dfb, dfc] # ... add as many df as you wish res = dfs[0].reset_index() for i in range(1,len(dfs)): res = res.merge(dfs[i].reset_index(), how='outer', left_on=['index','type'], right_on=['index','type']) res = res.set_index('index') print(res)
结果将是：

type a b c index 1 type1 10.0 NaN NaN 2 type1 20.0 11.0 NaN 3 type2 30.0 21.0 NaN 3 type3 NaN NaN 33.0

您能提供所需结果的示例吗？添加了所需数据框。添加了答案请检查最新编辑所需数据框的外观（或多或少）。例如，在您最近的输出中，应该只有4行，因为第1行和第4行应该组合在一起（因为类型相同1）dataframes@alexprice我修改了我的答案，使之更适合不同数量的数据帧