python输出文本相似性_Python_Nlp_Levenshtein Distance

python输出文本相似性

python nlp

python输出文本相似性,python,nlp,levenshtein-distance,Python,Nlp,Levenshtein Distance,我有一些字符串需要组合在一起 ['Aus Super', 'Aus super', 'Aust Super', 'Australian Super', 'LG Super', 'Korean LG Super', 'Q Super'] 我想分三组： ['Aus Super', 'Aus super', 'Aust Super', 'Australian Super'] ['LG Super', 'Korean LG Super'] ['Q Super'] 我尝试了以下代码： from fuz

我有一些字符串需要组合在一起

['Aus Super', 'Aus super', 'Aust Super', 'Australian Super', 'LG Super', 'Korean LG Super', 'Q Super']

我想分三组：

['Aus Super', 'Aus super', 'Aust Super', 'Australian Super']
['LG Super', 'Korean LG Super']
['Q Super']

我尝试了以下代码：

from fuzzywuzzy import fuzz

listItem = ['Aus Super', 'Aus super', 'Aust Super', 'Australian Super', 'LG Super', 'Korean LG Super', 'Q Super']

for l in listItem: 
    print(l, fuzz.token_set_ratio(l, 'LG Super'))

结果如下：

根据分数，“Q Super”和“LG Super”将与“Aus Super”分组，而不是“Australian Super”

我也试过levenshtein距离，但仍然不太好，因为“Aust”和“Australian”之间的区别是5，而“Aust”和“Q”之间的距离是4

有人能帮忙吗？多谢各位

请以文本格式提交结果（使用“代码”格式。你到底想做什么？根据关键字将所有字符串放在一个列表中或附加到不同的列表中？你应该使用加权levenshtein算法来满足你的目的：（这个库来自谷歌-我自己从未尝试过）.@ThenNewGuy此处我尝试使用文本格式，但不断出现错误，在修复上述格式之前不允许发布。我尝试将类似字符串分组，如上面所示的三个列表。