Python 合并数据帧时重复列
我想合并df1和df2。当我合并df1和df2时,当前的问题是它会产生重复的“Fluc”列。数据帧必须在class='Horse'上进行合并 数据帧代码:Python 合并数据帧时重复列,python,pandas,dataframe,Python,Pandas,Dataframe,我想合并df1和df2。当我合并df1和df2时,当前的问题是它会产生重复的“Fluc”列。数据帧必须在class='Horse'上进行合并 数据帧代码: cols1 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2','Bookmaker', 'Odds'] df1 = pd.DataFrame(data=data, columns=cols1) cols2 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2', 'Bookmaker', 'A
cols1 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2','Bookmaker', 'Odds']
df1 = pd.DataFrame(data=data, columns=cols1)
cols2 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2', 'Bookmaker', 'AvgOdds']
df2 = pd.DataFrame(data=data, columns=cols2)
df3 = df2.groupby(by='Horse', sort=False).mean()
df3 = df3.reset_index()
df4 = round(df3,2)
dfmerge = pd.merge(df1,df4,on='Horse',how='inner')
df1的输出:
Race Horse Fluc 1 Fluc 2 Bookmaker Odds
0 Ipswich R1 Battle Through 4.2 4.22 BetEasy 4.20
1 Ipswich R1 Battle Through 4.2 4.22 Neds 4.20
2 Ipswich R1 Battle Through 4.2 4.22 Sportsbet 4.20
3 Ipswich R1 Battle Through 4.2 4.22 SportsBetting 4.45
4 Ipswich R1 Battle Through 4.2 4.22 Bet365 4.20
df4的输出:
Race Horse Fluc 1 Fluc 2 Bookmaker AvgOdds
0 Ipswich R1 Battle Through 4.2 4.22 BetEasy 4.20
1 Ipswich R1 Battle Through 4.2 4.22 Neds 4.20
2 Ipswich R1 Battle Through 4.2 4.22 Sportsbet 4.20
3 Ipswich R1 Battle Through 4.2 4.22 SportsBetting 4.45
4 Ipswich R1 Battle Through 4.2 4.22 Bet365 4.20
dfmerge的输出:
Race Horse Fluc 1_x Fluc 2_x Bookmaker Odds Fluc 1_y Fluc 2_y AvgOdds
0 Ipswich R1 Battle Through 8.34 8.38 Neds 8.5 8.34 8.38 8.65
1 Ipswich R1 Battle Through 8.34 8.38 Sportsbet 8.0 8.34 8.38 8.65
2 Ipswich R1 Battle Through 8.34 8.38 SportsBetting 9.1 8.34 8.38 8.65
3 Ipswich R1 Battle Through 8.34 8.38 Bet365 9.0 8.34 8.38 8.65
4 Ipswich R1 Simply Fly 1.89 1.87 Neds 1.8 1.89 1.87 1.84
Race Horse Fluc 1 Fluc 2 Bookmaker Odds AvgOdds
0 Ipswich R1 Battle Through 4.2 4.22 BetEasy 4.20 4.2
1 Ipswich R1 Battle Through 4.2 4.22 Neds 4.20 4.2
2 Ipswich R1 Battle Through 4.2 4.22 Sportsbet 4.20 4.2
3 Ipswich R1 Battle Through 4.2 4.22 SportsBetting 4.45 4.2
4 Ipswich R1 Battle Through 4.2 4.22 Bet365 4.20 4.2
dfmerge的所需输出:
Race Horse Fluc 1_x Fluc 2_x Bookmaker Odds Fluc 1_y Fluc 2_y AvgOdds
0 Ipswich R1 Battle Through 8.34 8.38 Neds 8.5 8.34 8.38 8.65
1 Ipswich R1 Battle Through 8.34 8.38 Sportsbet 8.0 8.34 8.38 8.65
2 Ipswich R1 Battle Through 8.34 8.38 SportsBetting 9.1 8.34 8.38 8.65
3 Ipswich R1 Battle Through 8.34 8.38 Bet365 9.0 8.34 8.38 8.65
4 Ipswich R1 Simply Fly 1.89 1.87 Neds 1.8 1.89 1.87 1.84
Race Horse Fluc 1 Fluc 2 Bookmaker Odds AvgOdds
0 Ipswich R1 Battle Through 4.2 4.22 BetEasy 4.20 4.2
1 Ipswich R1 Battle Through 4.2 4.22 Neds 4.20 4.2
2 Ipswich R1 Battle Through 4.2 4.22 Sportsbet 4.20 4.2
3 Ipswich R1 Battle Through 4.2 4.22 SportsBetting 4.45 4.2
4 Ipswich R1 Battle Through 4.2 4.22 Bet365 4.20 4.2
试试这个
dfmerge = pd.merge(df1, df4, on=['Race', 'Horse', 'Fluc 1', 'Fluc 2', 'Bookmaker'], how='inner')
print(dfmerge)
输出:
Race Horse Fluc 1 Fluc 2 Bookmaker Odds AvgOdds
0 Ipswich R1 Battle Through 4.2 4.22 BetEasy 4.20 4.20
1 Ipswich R1 Battle Through 4.2 4.22 Neds 4.20 4.20
2 Ipswich R1 Battle Through 4.2 4.22 Sportsbet 4.20 4.20
3 Ipswich R1 Battle Through 4.2 4.22 SportsBetting 4.45 4.45
4 Ipswich R1 Battle Through 4.2 4.22 Bet365 4.20 4.20
使用参数为重复的列添加后缀&基于suffexhi删除列,当您合并df1和df4时,您应该向我们显示df4的输出,而不是df2。合并函数的通常行为是将后缀(x和y)添加到两个数据帧中的列中。为什么它不为“收受赌注者”、“马”等提供(x和y)列。那么您只是想将AvgOdds列从df2引入df1吗?如果是这种情况,您是否尝试过:how='left'?我需要合并Fluc列,这样就没有重复的列。这是主要问题