Python 如何组合熊猫中的3个复杂数据帧
我有3个数据帧,分别命名为df1、df2和df3Python 如何组合熊猫中的3个复杂数据帧,python,python-2.7,pandas,Python,Python 2.7,Pandas,我有3个数据帧,分别命名为df1、df2和df3 df1: match_up result 0 1985_1116_1234 1 1 1985_1120_1345 1 2 1985_1207_1250 1 3 1985_1229_1425 1 4 1985_1242_1325 1 df2: team_df2 win_df2 0 1207 0.700
df1:
match_up result
0 1985_1116_1234 1
1 1985_1120_1345 1
2 1985_1207_1250 1
3 1985_1229_1425 1
4 1985_1242_1325 1
df2:
team_df2 win_df2
0 1207 0.700
2 1116 0.636
3 1120 0.621
4 1229 0.615
5 1242 0.679
df3:
team_df3 win_df3
1 1234 0.667
7 1250 0.759
11 1325 0.774
12 1345 0.742
15 1425 0.667
我需要一个新的数据帧
组合df1
、df2
和df3
,格式如下:
match_up result team_df2 team_df3 win_df2 win_df3
0 1985_1116_1234 1 1116 1234 0.636 0.667
1 1985_1120_1345 1 1120 1345 0.621 0.742
2 1985_1207_1250 1 1207 1250 0.700 0.759
3 1985_1229_1425 1 1229 1425 0.615 0.667
4 1985_1242_1325 1 1242 1325 0.679 0.774
如何在pandas中执行此操作?您需要提取字符串并将其转换为整数,以便正确地合并
# Set up result DataFrame
df = df1.copy()
df['year'], df['id2'], df['id3'] = list(zip(*df['match_up'].str.split('_')))
df[['id2', 'id3']] = df[['id2', 'id3']].astype(int)
# Do merges
df = pd.merge(df, df2, left_on='id2', right_on='team_df2')
df = pd.merge(df, df3, left_on='id3', right_on='team_df3')
# Drop unneeded columns and print
df = df.drop(['id2', 'year', 'id3'], axis=1)
print(df)
屈服
match_up result team_df2 win_df2 team_df3 win_df3
0 1985_1116_1234 1 1116 0.636 1234 0.667
1 1985_1120_1345 1 1120 0.621 1345 0.742
2 1985_1207_1250 1 1207 0.700 1250 0.759
3 1985_1229_1425 1 1229 0.615 1425 0.667
4 1985_1242_1325 1 1242 0.679 1325 0.774
输出:
In [23]: final
Out[23]:
match_up results team_df2 win_df2 team_df3 win_df3
0 1985_1116_1234 1 1116 0.636 1234 0.667
1 1985_1120_1345 1 1120 0.621 1345 0.742
2 1985_1207_1250 1 1207 0.700 1250 0.759
3 1985_1229_1425 1 1229 0.615 1425 0.667
4 1985_1242_1325 1 1242 0.679 1325 0.774
left_on=df1['match_up'].apply(lambda x:int(x.split(''u')[1])).values
这个函数在这里做什么?@MJP它通过使用''uu'作为分隔符将匹配键拆分为一个列表来分解匹配键。然后,它取第二个值([1)],这是第一个PD.MyGe的关键。为什么我们应该指定<代码> Realyon on /CODE >,因为我们已经指定了<代码>如何= Leave/Cuth>,它只考虑左数据帧中的密钥。那么,right\u on
“right\u on:右数据框中的列需要用作键。可以是列名,也可以是长度等于数据框长度的数组”Ok got!!您的解决方案非常优雅且易于理解。
In [23]: final
Out[23]:
match_up results team_df2 win_df2 team_df3 win_df3
0 1985_1116_1234 1 1116 0.636 1234 0.667
1 1985_1120_1345 1 1120 0.621 1345 0.742
2 1985_1207_1250 1 1207 0.700 1250 0.759
3 1985_1229_1425 1 1229 0.615 1425 0.667
4 1985_1242_1325 1 1242 0.679 1325 0.774