elasticsearch 精确匹配和模糊性…什么是好方法?
我花了好几个小时试图找到创建和自动完成支持多语言城市搜索的最佳方法。ES/EN、fuzzieness和get priority for exact match在结果顶部显示了这一点,但我找不到一个好方法来完成这项任务 我目前的解决方案在很多情况下都很有效,但当我为罗马找到第一个选择时,是东罗马尼亚的亚西,罗马尼亚和意大利的罗马是30函数完全匹配的 结果Json:elasticsearch 精确匹配和模糊性…什么是好方法?,elasticsearch,autocomplete,fuzzy-search,exact-match,elasticsearch,Autocomplete,Fuzzy Search,Exact Match,我花了好几个小时试图找到创建和自动完成支持多语言城市搜索的最佳方法。ES/EN、fuzzieness和get priority for exact match在结果顶部显示了这一点,但我找不到一个好方法来完成这项任务 我目前的解决方案在很多情况下都很有效,但当我为罗马找到第一个选择时,是东罗马尼亚的亚西,罗马尼亚和意大利的罗马是30函数完全匹配的 结果Json: [{"_index":"destinations","_type":"doc","_id":"_X80XWcBn2nzTu98N7_F
[{"_index":"destinations","_type":"doc","_id":"_X80XWcBn2nzTu98N7_F","_score":75.50012,"_source":{"destination_name_en":"Iasi-East Romania","destination_name_es":"Iasi-East Romania","destination_name_pt":"Iasi-East Romania","country_code":"RO","country_name":"ROMANIA","destination_id":7953,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"7380XWcBn2nzTu98OMZl","_score":73.116455,"_source":{"destination_name_en":"La Romana","destination_name_es":"La Romana","destination_name_pt":"La Romana","country_code":"DO","country_name":"DOMINICAN REPUBLIC","destination_id":2816,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"1X80XWcBn2nzTu98OMZl","_score":71.4391,"_source":{"_index":"destinations","_type":"doc","_id":"8H80XWcBn2nzTu98OMZl","_score":52.018818,"_source":{"destination_name_en":"Rome","destination_name_es":"Roma","destination_name_pt":"Roma","country_code":"IT","country_name":"ITALY","destination_id":6338,"popularity":"0"}}]
现在这是我最好的解决办法
映射:
'settings' => [
'analysis' => [
'filter' => [
'autocomplete_filter' => [
"type"=> "edge_ngram",
"min_gram"=> 1,
"max_gram"=> 20,
]
],
'analyzer' => [
'autocomplete' => [
"type" => "custom",
'tokenizer' => "standard",
'filter' => ['lowercase', 'asciifolding', 'autocomplete_filter'],
]
],
],
],
'mappings' =>[
'doc' => [
"properties"=> [
"destination_name_en"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"destination_name_es"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"destination_name_pt"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"popularity"=> [
"type"=> "integer",
]
]
]
]
搜索:
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
]
]
]
]
此外,我还想利用她的人气价值为特定的目的地增添动力
我希望有人能给我一个榜样,或者给我一个方向
我非常感谢问题是,当你搜索罗姆人时,Iasi East Romania是第一个结果,因为它包含所有语言的罗姆人。但罗马队在ES/PT/IT方面只与罗马队相匹配,而在EN方面则没有 因此,如果你想促进精确匹配,你需要在另一个字段中索引你的城市名称,而不自动完成所有语言,并在这些字段的“应该”中添加一个新子句 映射示例:
"properties"=> [
"destination_name_en"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
"fields": => [
"exact" => [
"type"=> "text",
"analyzer"=> "standard", // you could use a more fancy analyzer here
]
]
],
....
在查询中:
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
],
[
"multi_match"=>[
"query"=>$text,
"type"=>"most_fields"
"fields"=>[
"destination_name_*.exact"
],
"boost" => 2
]
]
]
]
]
你能试试那样的东西并随时通知我们吗 这工作很有魅力!。现在我可以在第一个结果中获得罗马,也可以在单词末尾接受错误。罗米在第一场比赛中也回到了罗马 我有两个罗马,意大利罗马和澳大利亚罗马,我想提高世界上一些受欢迎的城市 我使用的是函数分数,但这会得到非常奇怪的结果 这是我当前的代码:
'query' => [
'function_score' => [
'field_value_factor' => [
'field' => 'popularity',
],
"score_mode" => "multiply",
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*.exact"
],
"boost" => 2
]
]
]
]
]
],
],
有什么建议吗
非常感谢你的帮助。从现在起,我给你最好的答案,因为你已经解决了主要问题你能为你的函数分数问题打开一个新问题吗?SO的政策是有针对性的问题,以进一步提高可读性。以后我会很乐意帮助你的。顺便说一句,你能在你的问题中添加“人气”字段的值和你获得的奇怪结果吗?你好,皮埃尔,我在下面的链接中创建了一个新问题:谢谢大家!