Python 合并数据帧时重复列_Python_Pandas_Dataframe

Python 合并数据帧时重复列

python pandas dataframe

Python 合并数据帧时重复列,python,pandas,dataframe,Python,Pandas,Dataframe,我想合并df1和df2。当我合并df1和df2时，当前的问题是它会产生重复的“Fluc”列。数据帧必须在class='Horse'上进行合并数据帧代码： cols1 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2','Bookmaker', 'Odds'] df1 = pd.DataFrame(data=data, columns=cols1) cols2 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2', 'Bookmaker', 'A

我想合并df1和df2。当我合并df1和df2时，当前的问题是它会产生重复的“Fluc”列。数据帧必须在class='Horse'上进行合并

数据帧代码：

cols1 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2','Bookmaker', 'Odds']
df1 = pd.DataFrame(data=data, columns=cols1)
cols2 = ['Race', 'Horse', 'Fluc 1', 'Fluc 2', 'Bookmaker', 'AvgOdds']
df2 = pd.DataFrame(data=data, columns=cols2)
df3 = df2.groupby(by='Horse', sort=False).mean()
df3 = df3.reset_index()
df4 = round(df3,2)
dfmerge = pd.merge(df1,df4,on='Horse',how='inner')

df1的输出：

              Race           Horse  Fluc 1  Fluc 2      Bookmaker   Odds
0       Ipswich R1  Battle Through     4.2    4.22        BetEasy   4.20
1       Ipswich R1  Battle Through     4.2    4.22           Neds   4.20
2       Ipswich R1  Battle Through     4.2    4.22      Sportsbet   4.20
3       Ipswich R1  Battle Through     4.2    4.22  SportsBetting   4.45
4       Ipswich R1  Battle Through     4.2    4.22         Bet365   4.20

df4的输出：

              Race           Horse  Fluc 1  Fluc 2      Bookmaker  AvgOdds
0       Ipswich R1  Battle Through     4.2    4.22        BetEasy     4.20
1       Ipswich R1  Battle Through     4.2    4.22           Neds     4.20
2       Ipswich R1  Battle Through     4.2    4.22      Sportsbet     4.20
3       Ipswich R1  Battle Through     4.2    4.22  SportsBetting     4.45
4       Ipswich R1  Battle Through     4.2    4.22         Bet365     4.20

dfmerge的输出：

              Race           Horse  Fluc 1_x  Fluc 2_x      Bookmaker  Odds  Fluc 1_y  Fluc 2_y  AvgOdds
0       Ipswich R1  Battle Through      8.34      8.38           Neds   8.5      8.34      8.38     8.65
1       Ipswich R1  Battle Through      8.34      8.38      Sportsbet   8.0      8.34      8.38     8.65
2       Ipswich R1  Battle Through      8.34      8.38  SportsBetting   9.1      8.34      8.38     8.65
3       Ipswich R1  Battle Through      8.34      8.38         Bet365   9.0      8.34      8.38     8.65
4       Ipswich R1      Simply Fly      1.89      1.87           Neds   1.8      1.89      1.87     1.84

              Race           Horse  Fluc 1  Fluc 2      Bookmaker   Odds    AvgOdds
0       Ipswich R1  Battle Through     4.2    4.22        BetEasy   4.20    4.2
1       Ipswich R1  Battle Through     4.2    4.22           Neds   4.20    4.2
2       Ipswich R1  Battle Through     4.2    4.22      Sportsbet   4.20    4.2
3       Ipswich R1  Battle Through     4.2    4.22  SportsBetting   4.45    4.2
4       Ipswich R1  Battle Through     4.2    4.22         Bet365   4.20    4.2

dfmerge的所需输出：

              Race           Horse  Fluc 1_x  Fluc 2_x      Bookmaker  Odds  Fluc 1_y  Fluc 2_y  AvgOdds
0       Ipswich R1  Battle Through      8.34      8.38           Neds   8.5      8.34      8.38     8.65
1       Ipswich R1  Battle Through      8.34      8.38      Sportsbet   8.0      8.34      8.38     8.65
2       Ipswich R1  Battle Through      8.34      8.38  SportsBetting   9.1      8.34      8.38     8.65
3       Ipswich R1  Battle Through      8.34      8.38         Bet365   9.0      8.34      8.38     8.65
4       Ipswich R1      Simply Fly      1.89      1.87           Neds   1.8      1.89      1.87     1.84

              Race           Horse  Fluc 1  Fluc 2      Bookmaker   Odds    AvgOdds
0       Ipswich R1  Battle Through     4.2    4.22        BetEasy   4.20    4.2
1       Ipswich R1  Battle Through     4.2    4.22           Neds   4.20    4.2
2       Ipswich R1  Battle Through     4.2    4.22      Sportsbet   4.20    4.2
3       Ipswich R1  Battle Through     4.2    4.22  SportsBetting   4.45    4.2
4       Ipswich R1  Battle Through     4.2    4.22         Bet365   4.20    4.2

试试这个

dfmerge = pd.merge(df1, df4, on=['Race', 'Horse', 'Fluc 1', 'Fluc 2', 'Bookmaker'], how='inner')
print(dfmerge)

输出：

         Race           Horse  Fluc 1  Fluc 2      Bookmaker  Odds  AvgOdds
0  Ipswich R1  Battle Through     4.2    4.22        BetEasy  4.20     4.20
1  Ipswich R1  Battle Through     4.2    4.22           Neds  4.20     4.20
2  Ipswich R1  Battle Through     4.2    4.22      Sportsbet  4.20     4.20
3  Ipswich R1  Battle Through     4.2    4.22  SportsBetting  4.45     4.45
4  Ipswich R1  Battle Through     4.2    4.22         Bet365  4.20     4.20

使用参数为重复的列添加后缀&基于suffexhi删除列，当您合并df1和df4时，您应该向我们显示df4的输出，而不是df2。合并函数的通常行为是将后缀（x和y）添加到两个数据帧中的列中。为什么它不为“收受赌注者”、“马”等提供（x和y）列。那么您只是想将AvgOdds列从df2引入df1吗？如果是这种情况，您是否尝试过：how='left'？我需要合并Fluc列，这样就没有重复的列。这是主要问题