Python 如何发现两个问题之间的相似性,即使单词是不同的

Python 如何发现两个问题之间的相似性,即使单词是不同的,python,nlp,chatbot,sentence-similarity,Python,Nlp,Chatbot,Sentence Similarity,有没有办法找到字符串的意思是否相似,,,即使字符串中的单词是有区别的 到目前为止,我试着用模糊wuzzy、levenstein距离、余弦相似性来匹配字符串,但都是匹配单词,而不是单词的意思 Str1 = "what are types of negotiation" Str2 = "what are advantages of negotiation" Str3 = "what are categories of negotiation" Ratio = fuzz.ratio(Str1.lowe

有没有办法找到字符串的意思是否相似,,,即使字符串中的单词是有区别的

到目前为止,我试着用模糊wuzzy、levenstein距离、余弦相似性来匹配字符串,但都是匹配单词,而不是单词的意思

Str1 = "what are types of negotiation"
Str2 = "what are advantages of negotiation"
Str3 = "what are categories of negotiation"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
Token_Sort_Ratio = fuzz.token_sort_ratio(Str1,Str2)
Ratio1 = fuzz.ratio(Str1.lower(),Str3.lower())
Partial_Ratio1 = fuzz.partial_ratio(Str1.lower(),Str3.lower())
Token_Sort_Ratio1 = fuzz.token_sort_ratio(Str1,Str3)
print("fuzzywuzzy")
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str2," ",Partial_Ratio)
print(Str1," ",Str2," ",Token_Sort_Ratio)
print(Str1," ",Str3," ",Ratio1)
print(Str1," ",Str3," ",Partial_Ratio1)
print(Str1," ",Str3," ",Token_Sort_Ratio1)
print("levenshtein ratio")
Ratio = levenshtein_ratio_and_distance(Str1,Str2,ratio_calc = True)
Ratio1 = levenshtein_ratio_and_distance(Str1,Str3,ratio_calc = True)
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str3," ",Ratio)

output:
fuzzywuzzy
what are types of negotiation   what are advantages of negotiation   86
what are types of negotiation   what are advantages of negotiation   76
what are types of negotiation   what are advantages of negotiation   73
what are types of negotiation   what are categories of negotiation   86
what are types of negotiation   what are categories of negotiation   76
what are types of negotiation   what are categories of negotiation   73
levenshtein ratio
what are types of negotiation   what are advantages of negotiation               
0.8571428571428571
what are types of negotiation   what are categories of negotiation       
0.8571428571428571



expected output:
"what are the types of negotiation skill?"
"what are the categories in negotiation skill?"
output:similar
"what are the types of negotiation skill?"
"what are the advantages of negotiation skill?"
output:not similar

您希望对两个字符串的语义相似性进行评分

Fuzzy wuzzy和Levenshtein距离只对字符距离进行评分

您需要考虑语义信息。因此,您需要字符串的语义表示

也许一个简单但有效的方法包括:

  • 计算表示两个字符串的两个向量,为您的语言使用预训练的单词嵌入(例如FastText-get_Session_vector)
  • 计算两个向量之间的余弦相似性(1:相等的字符串;0:完全不同的字符串)
  • 当然,还有更好更复杂的方法。
    为了深入理解这一主题,我推荐这篇文章(),它包含了丰富的解释和代码实现。

    简单地说:有没有办法找到两个字符串的意思的相似性呢?这对我理解主题的核心有很大帮助。如果你在步骤1中使用通用句子编码器,可能会得到更好的结果: