Merge 数据帧长度不同时的模糊匹配
他们已将此问题标记为重复,但没有答案,请重试 我有两个数据集df2Merge 数据帧长度不同时的模糊匹配,merge,fuzzy,fuzzy-comparison,difflib,Merge,Fuzzy,Fuzzy Comparison,Difflib,他们已将此问题标记为重复,但没有答案,请重试 我有两个数据集df2 > Page Title ... dummy > 383 India Companies Act 2013: Five Key Points Abou... ... 1 > 384 Seven Things Every Company Should Know about A...
> Page Title ... dummy
> 383 India Companies Act 2013: Five Key Points Abou... ... 1
> 384 Seven Things Every Company Should Know about A... ... 1
> 385 What Is a Low-Carbon Lifestyle, and How Can I ... ... 1
> 386 Top 10 CSR Events of 2010 | Blog | BSR ... 1
> 387 10 Social Media Rules for Social Responsibilit... ... 1
df1
它们有不同的长度
我试过这种方法
df2['Page Title'] = df2['Page Title'].apply(lambda x: difflib.get_close_matches(x, df1.title)[0])
但我得到以下错误,可能是因为长度不同
索引器:列表索引超出范围
如何解决它?这应该可以:
匹配的标题=[]
for row in df1.index:
title_name = df1.get_value(row,"Page Title")
for columns in df2.index:
title=df2.get_value(columns,"title")
matched_token=fuzz.partial_ratio(title_name,title)
if matched_token> 80:
matched_titles.append([title_name,title,matched_token])
df2['Page Title'] = df2['Page Title'].apply(lambda x: difflib.get_close_matches(x, df1.title)[0])
for row in df1.index:
title_name = df1.get_value(row,"Page Title")
for columns in df2.index:
title=df2.get_value(columns,"title")
matched_token=fuzz.partial_ratio(title_name,title)
if matched_token> 80:
matched_titles.append([title_name,title,matched_token])