Python 如何组合熊猫中的3个复杂数据帧

Python 如何组合熊猫中的3个复杂数据帧,python,python-2.7,pandas,Python,Python 2.7,Pandas,我有3个数据帧,分别命名为df1、df2和df3 df1: match_up result 0 1985_1116_1234 1 1 1985_1120_1345 1 2 1985_1207_1250 1 3 1985_1229_1425 1 4 1985_1242_1325 1 df2: team_df2 win_df2 0 1207 0.700

我有3个数据帧,分别命名为df1、df2和df3

df1:
      match_up        result
0   1985_1116_1234      1
1   1985_1120_1345      1
2   1985_1207_1250      1
3   1985_1229_1425      1
4   1985_1242_1325      1

df2:
  team_df2       win_df2  
0  1207           0.700               
2  1116           0.636               
3  1120           0.621               
4  1229           0.615                
5  1242           0.679                

df3:
    team_df3       win_df3  
1   1234           0.667               
7   1250           0.759               
11  1325           0.774               
12  1345           0.742               
15  1425           0.667 
我需要一个
新的数据帧
组合
df1
df2
df3
,格式如下:

          match_up        result  team_df2  team_df3  win_df2  win_df3
    0   1985_1116_1234      1      1116       1234    0.636     0.667
    1   1985_1120_1345      1      1120       1345    0.621     0.742
    2   1985_1207_1250      1      1207       1250    0.700     0.759 
    3   1985_1229_1425      1      1229       1425    0.615     0.667
    4   1985_1242_1325      1      1242       1325    0.679     0.774

如何在pandas中执行此操作?

您需要提取字符串并将其转换为整数,以便正确地合并

# Set up result DataFrame
df = df1.copy()
df['year'], df['id2'], df['id3'] = list(zip(*df['match_up'].str.split('_')))
df[['id2', 'id3']] = df[['id2', 'id3']].astype(int)

# Do merges
df = pd.merge(df, df2, left_on='id2', right_on='team_df2')
df = pd.merge(df, df3, left_on='id3', right_on='team_df3')

# Drop unneeded columns and print
df = df.drop(['id2', 'year', 'id3'], axis=1)
print(df)
屈服

         match_up  result  team_df2  win_df2  team_df3  win_df3
0  1985_1116_1234       1      1116    0.636      1234    0.667
1  1985_1120_1345       1      1120    0.621      1345    0.742
2  1985_1207_1250       1      1207    0.700      1250    0.759
3  1985_1229_1425       1      1229    0.615      1425    0.667
4  1985_1242_1325       1      1242    0.679      1325    0.774
输出:

In [23]: final
Out[23]: 
         match_up  results  team_df2  win_df2  team_df3  win_df3
0  1985_1116_1234        1      1116    0.636      1234    0.667
1  1985_1120_1345        1      1120    0.621      1345    0.742
2  1985_1207_1250        1      1207    0.700      1250    0.759
3  1985_1229_1425        1      1229    0.615      1425    0.667
4  1985_1242_1325        1      1242    0.679      1325    0.774

left_on=df1['match_up'].apply(lambda x:int(x.split(''u')[1])).values
这个函数在这里做什么?@MJP它通过使用''uu'作为分隔符将匹配键拆分为一个列表来分解匹配键。然后,它取第二个值([1)],这是第一个PD.MyGe的关键。为什么我们应该指定<代码> Realyon on /CODE >,因为我们已经指定了<代码>如何= Leave/Cuth>,它只考虑左数据帧中的密钥。那么,
right\u on
“right\u on:右数据框中的列需要用作键。可以是列名,也可以是长度等于数据框长度的数组”Ok got!!您的解决方案非常优雅且易于理解。
In [23]: final
Out[23]: 
         match_up  results  team_df2  win_df2  team_df3  win_df3
0  1985_1116_1234        1      1116    0.636      1234    0.667
1  1985_1120_1345        1      1120    0.621      1345    0.742
2  1985_1207_1250        1      1207    0.700      1250    0.759
3  1985_1229_1425        1      1229    0.615      1425    0.667
4  1985_1242_1325        1      1242    0.679      1325    0.774