带连字符和小写筛选器的Elasticsearch通配符查询_Search_Autocomplete_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch

带连字符和小写筛选器的Elasticsearch通配符查询

search autocomplete

带连字符和小写筛选器的Elasticsearch通配符查询,search,autocomplete,elasticsearch,Search,Autocomplete,elasticsearch,我想对QNMZ-1900执行通配符查询正如我在文档中阅读并亲自尝试的那样，Elasticsearch的标准标记器拆分连字符上的单词，例如QNMZ-1900将拆分为QNMZ和1900 为了防止这种行为，我使用了not\u analysisd功能 curl -XPUT 'localhost:9200/test-idx' -d '{ "mappings": { "doc": { "properties": { "foo" : {

我想对

QNMZ-1900执行通配符查询
正如我在文档中阅读并亲自尝试的那样，Elasticsearch的标准标记器拆分连字符上的单词，例如QNMZ-1900
将拆分为QNMZ
和1900

为了防止这种行为，我使用了not\u analysisd
功能
curl -XPUT 'localhost:9200/test-idx' -d '{
"mappings": {
    "doc": {
        "properties": {
            "foo" : {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}
}'

我在索引中输入了一些内容：
curl -XPUT 'localhost:9200/test-idx/doc/1' -d '{"foo": "QNMZ-1900"}'

刷新它：
curl -XPOST 'localhost:9200/test-idx/_refresh'

现在我可以使用通配符查询并查找QNMZ-1900
：
curl 'localhost:9200/test-idx/doc/_search?pretty=true' -d '{
"query": {
     "wildcard" : { "foo" : "QNMZ-19*" }
}

我的问题:
PUT test-idx
{
    "settings": {
        "analysis": {
            "analyzer": {
                "keylower": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": ["lowercase"]
                }
            }
        }
    }
}

POST /test-idx/doc/_mapping
{
    "properties": {
        "foo": {
            "type": "text",
            "fields": {
                "raw": {
                    "type": "keyword"
                },
                "lowercase_foo": {
                    "type": "text",
                    "analyzer": "keylower"
                }
            }
        }
    }
}

PUT /test-idx/doc/1
{"foo": "QNMZ-1900"}

如何使用小写搜索词运行通配符查询？
我试过：
curl -XDELETE 'localhost:9200/test-idx'
curl -XPUT 'localhost:9200/test-idx' -d '{
"mappings": {
    "doc": {
        "properties": {
            "foo" : {
                "type": "string",
                "index": "not_analyzed",
                "filter": "lowercase"
            }
        }
    }
}
}'
curl -XPUT 'localhost:9200/test-idx/doc/1' -d '{"foo": "QNMZ-1900"}'
curl -XPOST 'localhost:9200/test-idx/_refresh'

但是我的小写查询：
curl 'localhost:9200/test-idx/doc/_search?pretty=true' -d '{
"query": {
     "wildcard" : { "foo" : "qnmz-19*" }
}
}'

什么也没找到
如何修复它？
一种解决方案是使用

一个关键字
标记器（它保持输入值不变，就好像它是未分析的一样
）
a小写
标记过滤器

我试过这个：
POST test-idx
{
  "index":{
    "analysis":{
      "analyzer":{
        "lowercase_hyphen":{
          "type":"custom",
          "tokenizer":"keyword",
          "filter":["lowercase"]
        }
      }
    }
  }
}

PUT test-idx/doc/_mapping
{
  "doc":{
    "properties": {
        "foo" : {
          "type": "string",
          "analyzer": "lowercase_hyphen"
        }
    }      
  }
}

POST test-idx/doc
{
  "foo":"QNMZ-1900"
}

如您所见，使用_analyze端点如下：
GET test-idx/_analyze?analyzer=lowercase_hyphen&text=QNMZ-1900

仅输出一个小写标记，但不在连字符上拆分：
{
   "tokens": [
      {
         "token": "qnmz-1900",
         "start_offset": 0,
         "end_offset": 9,
         "type": "word",
         "position": 1
      }
   ]
}

然后，使用相同的查询：
POST test-idx/doc/_search
{
  "query": {
    "wildcard" : { "foo" : "qnmz-19*" }    
  }
}

我有一个结果，这就是你想要的：
{
   "took": 66,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test-idx",
            "_type": "doc",
            "_id": "wo1yanIjQGmvgfScMg4hyg",
            "_score": 1,
            "_source": {
               "foo": "QNMZ-1900"
            }
         }
      ]
   }
}

但是，请注意，这将允许您仅使用小写值进行查询。
正如Andrei在评论中所述，值为QNMZ-19*
的同一查询不会返回任何内容
原因可以在中找到：在搜索时，未分析该值。
我已在基于ES 6.1的pet项目中检查了此方法。如下所示的数据模型允许按预期进行搜索：
PUT test-idx
{
    "settings": {
        "analysis": {
            "analyzer": {
                "keylower": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": ["lowercase"]
                }
            }
        }
    }
}

POST /test-idx/doc/_mapping
{
    "properties": {
        "foo": {
            "type": "text",
            "fields": {
                "raw": {
                    "type": "keyword"
                },
                "lowercase_foo": {
                    "type": "text",
                    "analyzer": "keylower"
                }
            }
        }
    }
}

PUT /test-idx/doc/1
{"foo": "QNMZ-1900"}

检查这两个搜索的资源。第一个将重新发出一个命中。第二个将返回0次点击
GET /test-idx/doc/_search
{
  "query": {
     "wildcard" : { "foo.lowercase_foo" : "qnmz-19*" }
  }
}

GET /test-idx/doc/_search
{
  "query": {
     "wildcard" : { "foo" : "qnmz-19*" }
  }
}

谢谢@ThomasC的意见。请仔细回答我的问题。我只是在学习弹性搜索。我不是这个数据库的专家。我不知道这是生产准备好的建议
 对于测试后idx/doc/_search{“query”：{“wildcard”：{“foo”：“QNMZ-19*”}}似乎不起作用实际上，不分析通配符查询输入，但通过这种方式，始终可以对小写值执行搜索。但是，我更新了我的答案。@ThomasC此答案对ElasticSearch 6.1仍然有效吗？这种搜索有更新、更方便的功能吗？你对不同的造型有什么看法。在这个例子中，我们可以将“foo”复制到json中的“foo_小写”字段中，该字段将包含“qnmz-1900”。然后我们可以通过curl'localhost:9200/test idx/doc/_search？pretty=true'-d'{“查询”：{“通配符”：{“foo”：“qnmz-19*”}或者，你认为这种建模是用于弹性搜索的反模式吗？@ RADOS.AWOSI？SKI。我还没有深入研究ES 6.1。但是，自从ES 2。x，<代码>字符串数据类型已经被文本（用于分析的字符串）和关键字替换（对于NOTHY分析的字符串）。。在我的回答中，字符串
应在映射中替换为文本
。关于您的建议，这实际上是我将如何实现它。使用多字段映射将原始值存储在主字段中，并将每个分析的版本（每个分析器一个）存储在子字段中。这是一个很好的做法，唯一的缺点是这些字段会占用一些空间，并且索引会更长。希望这有帮助：）感谢您的良好解释，此答案应该被接受。