Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 试图匹配两个df';s基于三个公共列,但它们都不相同_Python_Pandas_Dataframe_Join_Sequencematcher - Fatal编程技术网

Python 试图匹配两个df';s基于三个公共列,但它们都不相同

Python 试图匹配两个df';s基于三个公共列,但它们都不相同,python,pandas,dataframe,join,sequencematcher,Python,Pandas,Dataframe,Join,Sequencematcher,我有两个df df1 df2 我想要的结果是df2中与df1接近匹配的所有行 date league teams near_match 0 201902280000 womens international usa womenjapan women 1 1 201902280000 brazil amazonense sul

我有两个df

df1

df2

我想要的结果是df2中与df1接近匹配的所有行

    date            league                     teams                      near_match
0   201902280000    womens international       usa womenjapan women       1
1   201902280000    brazil amazonense          sul america ecrio negro am 0
2   201902280030    bolivia apertura           real potosibolivar         1
3   201902280030    brazil campeonato paulista palmeirasituano            0
4   201902280030    copa sudamericana          racing clubcorinthians     0 
我尝试使用
SequenceMatcher
使用for循环的一个变体,并将阈值设置为大于0.8的匹配,但没有成功

df_1['merge_teams'] = df_1['teams'] # we will use these as the merge keys
df_1['merge_date'] = df_1['date']
# df_1['merge_league'] = df_1['league']


for teams_1, date_1, league_1 in df_1[['teams','date']].values:
    for ixb, (teams_1, teams_2) in enumerate(df_2[['teams','date']].values):
        if difflib.SequenceMatcher(None,teams_1,teams_2).ratio() > .8:
            df_2.ix[ixb,'merge_teams'] = teams_1 # creates a merge key in df_2
        if difflib.SequenceMatcher(None,date_1, date_2).ratio() > .8:
            df_2.ix[ixb,'merge_date'] = date_1 # creates a merge key in df_2

# This should rturn all rows where teams,date and league all match by over 80%
# This is just for teams and date, I want to include league as well

任何建议或指导都将不胜感激。

看看我的回答,我将如何跨多个栏目进行回答?
    date            league                     teams                      near_match
0   201902280000    womens international       usa womenjapan women       1
1   201902280000    brazil amazonense          sul america ecrio negro am 0
2   201902280030    bolivia apertura           real potosibolivar         1
3   201902280030    brazil campeonato paulista palmeirasituano            0
4   201902280030    copa sudamericana          racing clubcorinthians     0 
df_1['merge_teams'] = df_1['teams'] # we will use these as the merge keys
df_1['merge_date'] = df_1['date']
# df_1['merge_league'] = df_1['league']


for teams_1, date_1, league_1 in df_1[['teams','date']].values:
    for ixb, (teams_1, teams_2) in enumerate(df_2[['teams','date']].values):
        if difflib.SequenceMatcher(None,teams_1,teams_2).ratio() > .8:
            df_2.ix[ixb,'merge_teams'] = teams_1 # creates a merge key in df_2
        if difflib.SequenceMatcher(None,date_1, date_2).ratio() > .8:
            df_2.ix[ixb,'merge_date'] = date_1 # creates a merge key in df_2

# This should rturn all rows where teams,date and league all match by over 80%
# This is just for teams and date, I want to include league as well