Python 使用多个公共列连接多个数据帧_Python_Python 3.x_Pandas

Python 使用多个公共列连接多个数据帧

python python-3.x pandas

Python 使用多个公共列连接多个数据帧,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有多个这样的数据帧- df=pd.DataFrame({'a':[1,2,3],'b':[3,4,5],'c':[4,6,7]}) df2=pd.DataFrame({'a':[1,2,3],'d':[66,24,55],'c':[4,6,7]}) df3=pd.DataFrame({'a':[1,2,3],'f':[31,74,95],'c':[4,6,7]}) 我想要这个输出- a c 0 1 4 1 2 6 2 3 7 这是3个数据集中的公共列。我

我有多个这样的数据帧-

df=pd.DataFrame({'a':[1,2,3],'b':[3,4,5],'c':[4,6,7]})
df2=pd.DataFrame({'a':[1,2,3],'d':[66,24,55],'c':[4,6,7]})
df3=pd.DataFrame({'a':[1,2,3],'f':[31,74,95],'c':[4,6,7]})

我想要这个输出-

这是3个数据集中的公共列。我正在寻找一种解决方案，它可以适用于多个列，而不必像我在上面看到的那样指定公共列（因为实际的数据帧是巨大的）。

以下各项的组合可以帮助解决您的用例：

dfs = (df,df2,df3)
cols = [ent.columns for ent in dfs]
cols

[Index(['a', 'b', 'c'], dtype='object'),
 Index(['a', 'd', 'c'], dtype='object'),
 Index(['a', 'f', 'c'], dtype='object')]

#find the common columns to all : 
from functools import reduce
universal_cols = reduce(lambda x,y : x.intersection(y), cols).tolist()
universal_cols

['a', 'c']

#filter for only universal_cols for each df
updates = [ent.filter(universal_cols) for ent in dfs]

如果列和列的内容相同，则可以跳过列表理解，只从一个数据帧进行筛选：

#let's use the first dataframe
output = df.filter(universal_cols)

如果列的内容不同，则连接并删除重复项：

#concatenate and drop duplicates
res = pd.concat(updates).drop_duplicates()

res  #output has the same result

    a   c
0   1   4
1   2   6
2   3   7

和的组合有助于您的用例：

dfs = (df,df2,df3)
cols = [ent.columns for ent in dfs]
cols

[Index(['a', 'b', 'c'], dtype='object'),
 Index(['a', 'd', 'c'], dtype='object'),
 Index(['a', 'f', 'c'], dtype='object')]

#find the common columns to all : 
from functools import reduce
universal_cols = reduce(lambda x,y : x.intersection(y), cols).tolist()
universal_cols

['a', 'c']

#filter for only universal_cols for each df
updates = [ent.filter(universal_cols) for ent in dfs]

如果列和列的内容相同，则可以跳过列表理解，只从一个数据帧进行筛选：

#let's use the first dataframe
output = df.filter(universal_cols)

如果列的内容不同，则连接并删除重复项：

#concatenate and drop duplicates
res = pd.concat(updates).drop_duplicates()

res  #output has the same result

    a   c
0   1   4
1   2   6
2   3   7

您可以使用，将函数

r_common

从左到右累积应用于

dfs

的数据帧，以便将

dfs

列表减少为单个数据帧

df_common

。该方法用于在

r\u common

函数中查找两个数据帧

d1

和

d2

中的公共列

def r_common(d1, d2):
    cols = d1.columns.intersection(d2.columns).tolist()
    m =  d1[cols].eq(d2[cols]).all()
    return d1[m[m].index]

df_common = reduce(r_common, dfs) # dfs = [df, df2, df3]

结果:

# print(df_common)
   a  c
0  1  4
1  2  6
2  3  7