elasticsearch,Autocomplete,elasticsearch" /> elasticsearch,Autocomplete,elasticsearch" />

Autocomplete 弹性搜索完成型的特殊性质

Autocomplete 弹性搜索完成型的特殊性质,autocomplete,elasticsearch,Autocomplete,elasticsearch,我是弹性搜索的新用户,我有一个映射:- curl -X PUT localhost:9200/vee_trade -d ' { "mappings": { "sDocument" : { "properties" : { "id" : { "type" : "long" }, "docId" : { "type" : "string" }, "documentType" : { "type" : "string" }, "rating" : { "t

我是弹性搜索的新用户,我有一个映射:-

curl -X PUT localhost:9200/vee_trade -d '
{
 "mappings": {
  "sDocument" : {
   "properties" : {
    "id" : { "type" : "long" },
    "docId" : { "type" : "string" },
    "documentType" : { "type" : "string" },
    "rating"  : { "type" : "float" },
    "suggestion" : { "type" :     "completion"}
    }
   }
  }
}
一个样本数据是:-

 _index: "test"
 _type: "sDocument"
 _id: "CATEGORY7"
 _score: 1
 _source{}
 docId: "CATEGORY7"
 documentType: "CATEGORY"
 id: 7
 suggestion[]
 "Kids's wear"
 rating: null
基本上,我的目标是启用自动建议,这适用于查询,但在自动建议条目中,我只获得术语和分数值,而我还需要其他字段值, 所以,我再次在suggestion字段上使用生成的自动建议术语进行匹配查询

{
  "query" : {
   "match" : {
    "suggestion" : "Men's"  
    }
   }
}
但我没有得到数据,因为elastic从术语中删除了特殊字符,看起来不确定它是如何存储和索引的,所以请告诉我

如何在自动建议中检索其他字段值以及搜索词???或者如何使匹配查询工作

提前感谢。

警告:答案很长。从你发布的内容中很难准确地说出问题所在,所以我给你几个选项,让你探索一些可能有助于解决问题的方法

你可以用几种不同的方法来做你想做的事情。我在上写过两种不同的自动完成方法,一种是关于使用,另一种是关于使用更复杂的设置

我发现完成建议在实践中有点笨拙,因为您必须明确地告诉它要响应什么,所以我倾向于更多地依赖于自定义分析框架。使用分析器进行实验的一种方法是为属性设置多个过去称为多字段的字段。下面我将展示几个例子

我将设置一个包含两个子字段的字段,这些子字段以不同的方式分析文本,然后对每个子字段使用匹配查询来显示其行为

看看这个:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "nGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            },
            "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "text_field": {
               "type": "string",
               "index_analyzer": "standard",
               "search_analyzer": "standard",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "ngram": {
                     "type": "string",
                     "index_analyzer": "nGram_analyzer",
                     "search_analyzer": "whitespace_analyzer"
                  }
               }
            }
         }
      }
   }
}
这里发生了很多事情,我鼓励你们不断地阅读。此外,我还从我的文档中获取了部分代码,因此您可能会发现通读这些代码有助于获得更全面的解释

但基本上,我有一个单独的字段,text_字段,它使用进行分析,既用于索引,即创建反向索引时为给定文档和字段生成的术语,也用于搜索将搜索短语分解为术语以匹配反向索引中的术语的方式。然后在这个字段上有两个不同的子字段。一个是根本不分析的,因此我们唯一可以匹配的术语是文档字段的原始文本。第二个子字段使用nGram_分析器进行索引分析,空白_分析器进行搜索分析,两者都在索引的设置中定义

现在,如果我们为几个文档编制索引:

PUT /test_index/doc/1
{
    "text_field": "Kid's wear"
}

PUT /test_index/doc/2
{
    "text_field": "Men's wear"
}
我们可以用各种方法来搜索他们

查询text_field.raw将需要精确的全文以获得匹配:

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field.raw": "Men's wear"
      }
   }
}
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 1,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}
针对text_字段的标准匹配查询按预期工作,因为在索引和搜索时,术语Men将标记为Men:

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field": "Men's"
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.625,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.625,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}
但如果我们再加上第二项,我们得到的结果可能不是我们想要的:

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field": "Men's wear"
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.09494676,
            "_source": {
               "text_field": "Kid's wear"
            }
         }
      ]
   }
}
这是因为术语的生成方式,以及匹配查询的默认运算符是or。我们可以通过指定匹配查询使用的和来限制结果:

我们可以使用text_field.ngram字段匹配部分单词,包括符号和标点符号,因为这是在索引设置中ngram_过滤器的定义中指定的:

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field.ngram": {
             "query":  "men's we",
             "operator": "and"
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}
希望这会给你一些关于如何继续的想法

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field.ngram": {
             "query":  "men's we",
             "operator": "and"
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}