Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/csharp-4.0/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Merge 数据帧长度不同时的模糊匹配_Merge_Fuzzy_Fuzzy Comparison_Difflib - Fatal编程技术网

Merge 数据帧长度不同时的模糊匹配

Merge 数据帧长度不同时的模糊匹配,merge,fuzzy,fuzzy-comparison,difflib,Merge,Fuzzy,Fuzzy Comparison,Difflib,他们已将此问题标记为重复,但没有答案,请重试 我有两个数据集df2 > Page Title ... dummy > 383 India Companies Act 2013: Five Key Points Abou... ... 1 > 384 Seven Things Every Company Should Know about A...

他们已将此问题标记为重复,但没有答案,请重试

我有两个数据集df2

>                                             Page Title  ...    dummy
>     383  India Companies Act 2013: Five Key Points Abou...  ...        1
>     384  Seven Things Every Company Should Know about A...  ...        1
>     385  What Is a Low-Carbon Lifestyle, and How Can I ...  ...        1
>     386             Top 10 CSR Events of 2010 | Blog | BSR  ...        1
>     387  10 Social Media Rules for Social Responsibilit...  ...        1
df1

它们有不同的长度

我试过这种方法

df2['Page Title'] = df2['Page Title'].apply(lambda x: difflib.get_close_matches(x, df1.title)[0])
但我得到以下错误,可能是因为长度不同

索引器:列表索引超出范围

如何解决它?

这应该可以: 匹配的标题=[]

for row in df1.index:
    title_name = df1.get_value(row,"Page Title")
    for columns in df2.index:
        title=df2.get_value(columns,"title")
        matched_token=fuzz.partial_ratio(title_name,title)
        if matched_token> 80:
            matched_titles.append([title_name,title,matched_token])
df2['Page Title'] = df2['Page Title'].apply(lambda x: difflib.get_close_matches(x, df1.title)[0])
for row in df1.index:
    title_name = df1.get_value(row,"Page Title")
    for columns in df2.index:
        title=df2.get_value(columns,"title")
        matched_token=fuzz.partial_ratio(title_name,title)
        if matched_token> 80:
            matched_titles.append([title_name,title,matched_token])