Python 试图匹配两个df';s基于三个公共列,但它们都不相同
我有两个df df1 df2 我想要的结果是df2中与df1接近匹配的所有行Python 试图匹配两个df';s基于三个公共列,但它们都不相同,python,pandas,dataframe,join,sequencematcher,Python,Pandas,Dataframe,Join,Sequencematcher,我有两个df df1 df2 我想要的结果是df2中与df1接近匹配的所有行 date league teams near_match 0 201902280000 womens international usa womenjapan women 1 1 201902280000 brazil amazonense sul
date league teams near_match
0 201902280000 womens international usa womenjapan women 1
1 201902280000 brazil amazonense sul america ecrio negro am 0
2 201902280030 bolivia apertura real potosibolivar 1
3 201902280030 brazil campeonato paulista palmeirasituano 0
4 201902280030 copa sudamericana racing clubcorinthians 0
我尝试使用SequenceMatcher
使用for循环的一个变体,并将阈值设置为大于0.8的匹配,但没有成功
df_1['merge_teams'] = df_1['teams'] # we will use these as the merge keys
df_1['merge_date'] = df_1['date']
# df_1['merge_league'] = df_1['league']
for teams_1, date_1, league_1 in df_1[['teams','date']].values:
for ixb, (teams_1, teams_2) in enumerate(df_2[['teams','date']].values):
if difflib.SequenceMatcher(None,teams_1,teams_2).ratio() > .8:
df_2.ix[ixb,'merge_teams'] = teams_1 # creates a merge key in df_2
if difflib.SequenceMatcher(None,date_1, date_2).ratio() > .8:
df_2.ix[ixb,'merge_date'] = date_1 # creates a merge key in df_2
# This should rturn all rows where teams,date and league all match by over 80%
# This is just for teams and date, I want to include league as well
任何建议或指导都将不胜感激。看看我的回答,我将如何跨多个栏目进行回答?
date league teams near_match
0 201902280000 womens international usa womenjapan women 1
1 201902280000 brazil amazonense sul america ecrio negro am 0
2 201902280030 bolivia apertura real potosibolivar 1
3 201902280030 brazil campeonato paulista palmeirasituano 0
4 201902280030 copa sudamericana racing clubcorinthians 0
df_1['merge_teams'] = df_1['teams'] # we will use these as the merge keys
df_1['merge_date'] = df_1['date']
# df_1['merge_league'] = df_1['league']
for teams_1, date_1, league_1 in df_1[['teams','date']].values:
for ixb, (teams_1, teams_2) in enumerate(df_2[['teams','date']].values):
if difflib.SequenceMatcher(None,teams_1,teams_2).ratio() > .8:
df_2.ix[ixb,'merge_teams'] = teams_1 # creates a merge key in df_2
if difflib.SequenceMatcher(None,date_1, date_2).ratio() > .8:
df_2.ix[ixb,'merge_date'] = date_1 # creates a merge key in df_2
# This should rturn all rows where teams,date and league all match by over 80%
# This is just for teams and date, I want to include league as well