Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用ruby实现词与短语的模糊匹配_Ruby_Fuzzy Search - Fatal编程技术网

用ruby实现词与短语的模糊匹配

用ruby实现词与短语的模糊匹配,ruby,fuzzy-search,Ruby,Fuzzy Search,我想用少量服务匹配一组数据 我的数据看起来像这样 {"title" : "blorb", "category" : "zurb" "description" : "Massage is the manipulation of superficial and deeper layers of muscle and connective tissue using various techniques, to enhance function, aid in the healing process,

我想用少量服务匹配一组数据

我的数据看起来像这样

{"title" : "blorb",
"category" : "zurb"
"description" : "Massage is the manipulation of superficial and deeper layers of muscle and connective tissue using various techniques, to enhance function, aid in the healing process, decrease muscle reflex activity..."
}
而我必须与之匹配

[“瑞典按摩”、“理发”]

显然,
“瑞典按摩”
将是赢家,但运行基准测试表明,
“理发”
是:

require 'amatch'

arr = [:levenshtein_similar, :hamming_similar, :pair_distance_similar, :longest_subsequence_similar, :longest_substring_similar, :jaro_similar, :jarowinkler_similar]

arr.each do |method|
  ["Swedish Massage", "Haircut"].each do |sh|
    pp ">>> #{sh} matched with #{method.to_s}"
    pp sh.send(method, description)
  end
end and nil
结果:

">>> Swedish Massage matched with jaro_similar"
# 0.5246896118183247
">>> Haircut matched with jaro_similar"
# 0.5353606789250354
">>> Swedish Massage matched with jarowinkler_similar"
# 0.5246896118183247
">>> Haircut matched with jarowinkler_similar"
# 0.5353606789250354
其余指数均远低于0.1


解决这个问题的更好方法是什么?

搜索是精确性和召回率之间的一场持久战。你可以尝试的一件事是将你的输入按单词分割——这将在
按摩
上产生更强的匹配,但结果会扩大结果集。现在,您将发现返回的句子中只有接近
瑞典语的单词。然后,您可以尝试通过平均多个单词的结果来控制扩展,使用停止列表来避免常见的单词,如
,促进查找彼此接近的标记等,但您永远不会看到真正完美的结果。如果你真的对微调感兴趣,我推荐ElasticSearch——相对容易学习,功能强大