Merge 数据帧长度不同时的模糊匹配_Merge_Fuzzy_Fuzzy Comparison_Difflib

Merge 数据帧长度不同时的模糊匹配

merge

Merge 数据帧长度不同时的模糊匹配,merge,fuzzy,fuzzy-comparison,difflib,Merge,Fuzzy,Fuzzy Comparison,Difflib,他们已将此问题标记为重复，但没有答案，请重试我有两个数据集df2 > Page Title ... dummy > 383 India Companies Act 2013: Five Key Points Abou... ... 1 > 384 Seven Things Every Company Should Know about A...

他们已将此问题标记为重复，但没有答案，请重试

我有两个数据集df2

>                                             Page Title  ...    dummy
>     383  India Companies Act 2013: Five Key Points Abou...  ...        1
>     384  Seven Things Every Company Should Know about A...  ...        1
>     385  What Is a Low-Carbon Lifestyle, and How Can I ...  ...        1
>     386             Top 10 CSR Events of 2010 | Blog | BSR  ...        1
>     387  10 Social Media Rules for Social Responsibilit...  ...        1

df1

它们有不同的长度

我试过这种方法

df2['Page Title'] = df2['Page Title'].apply(lambda x: difflib.get_close_matches(x, df1.title)[0])

但我得到以下错误，可能是因为长度不同

索引器：列表索引超出范围

如何解决它？

这应该可以：匹配的标题=[]

for row in df1.index:
    title_name = df1.get_value(row,"Page Title")
    for columns in df2.index:
        title=df2.get_value(columns,"title")
        matched_token=fuzz.partial_ratio(title_name,title)
        if matched_token> 80:
            matched_titles.append([title_name,title,matched_token])

df2['Page Title'] = df2['Page Title'].apply(lambda x: difflib.get_close_matches(x, df1.title)[0])

for row in df1.index:
    title_name = df1.get_value(row,"Page Title")
    for columns in df2.index:
        title=df2.get_value(columns,"title")
        matched_token=fuzz.partial_ratio(title_name,title)
        if matched_token> 80:
            matched_titles.append([title_name,title,matched_token])