Python 如何组合熊猫中的3个复杂数据帧_Python_Python 2.7_Pandas

Python 如何组合熊猫中的3个复杂数据帧

python python-2.7 pandas

Python 如何组合熊猫中的3个复杂数据帧,python,python-2.7,pandas,Python,Python 2.7,Pandas,我有3个数据帧，分别命名为df1、df2和df3 df1: match_up result 0 1985_1116_1234 1 1 1985_1120_1345 1 2 1985_1207_1250 1 3 1985_1229_1425 1 4 1985_1242_1325 1 df2: team_df2 win_df2 0 1207 0.700

我有3个数据帧，分别命名为df1、df2和df3

df1:
      match_up        result
0   1985_1116_1234      1
1   1985_1120_1345      1
2   1985_1207_1250      1
3   1985_1229_1425      1
4   1985_1242_1325      1

df2:
  team_df2       win_df2  
0  1207           0.700               
2  1116           0.636               
3  1120           0.621               
4  1229           0.615                
5  1242           0.679                

df3:
    team_df3       win_df3  
1   1234           0.667               
7   1250           0.759               
11  1325           0.774               
12  1345           0.742               
15  1425           0.667

我需要一个

新的数据帧

组合

df1

、

df2

和

df3

，格式如下：

          match_up        result  team_df2  team_df3  win_df2  win_df3
    0   1985_1116_1234      1      1116       1234    0.636     0.667
    1   1985_1120_1345      1      1120       1345    0.621     0.742
    2   1985_1207_1250      1      1207       1250    0.700     0.759 
    3   1985_1229_1425      1      1229       1425    0.615     0.667
    4   1985_1242_1325      1      1242       1325    0.679     0.774

如何在pandas中执行此操作？

您需要提取字符串并将其转换为整数，以便正确地合并


# Set up result DataFrame
df = df1.copy()
df['year'], df['id2'], df['id3'] = list(zip(*df['match_up'].str.split('_')))
df[['id2', 'id3']] = df[['id2', 'id3']].astype(int)

# Do merges
df = pd.merge(df, df2, left_on='id2', right_on='team_df2')
df = pd.merge(df, df3, left_on='id3', right_on='team_df3')

# Drop unneeded columns and print
df = df.drop(['id2', 'year', 'id3'], axis=1)
print(df)

屈服
         match_up  result  team_df2  win_df2  team_df3  win_df3
0  1985_1116_1234       1      1116    0.636      1234    0.667
1  1985_1120_1345       1      1120    0.621      1345    0.742
2  1985_1207_1250       1      1207    0.700      1250    0.759
3  1985_1229_1425       1      1229    0.615      1425    0.667
4  1985_1242_1325       1      1242    0.679      1325    0.774

输出：
In [23]: final
Out[23]: 
         match_up  results  team_df2  win_df2  team_df3  win_df3
0  1985_1116_1234        1      1116    0.636      1234    0.667
1  1985_1120_1345        1      1120    0.621      1345    0.742
2  1985_1207_1250        1      1207    0.700      1250    0.759
3  1985_1229_1425        1      1229    0.615      1425    0.667
4  1985_1242_1325        1      1242    0.679      1325    0.774

left_on=df1['match_up'].apply（lambda x:int（x.split（''u'）[1]））.values
这个函数在这里做什么？@MJP它通过使用''uu'作为分隔符将匹配键拆分为一个列表来分解匹配键。然后，它取第二个值（[1）]，这是第一个PD.MyGe的关键。为什么我们应该指定<代码> Realyon on /CODE >，因为我们已经指定了<代码>如何= Leave/Cuth>，它只考虑左数据帧中的密钥。那么，right\u on“right\u on:右数据框中的列需要用作键。可以是列名，也可以是长度等于数据框长度的数组”Ok got！！您的解决方案非常优雅且易于理解。
In [23]: final
Out[23]: 
         match_up  results  team_df2  win_df2  team_df3  win_df3
0  1985_1116_1234        1      1116    0.636      1234    0.667
1  1985_1120_1345        1      1120    0.621      1345    0.742
2  1985_1207_1250        1      1207    0.700      1250    0.759
3  1985_1229_1425        1      1229    0.615      1425    0.667
4  1985_1242_1325        1      1242    0.679      1325    0.774