Elasticsearch未分析字段上的通配符搜索_Search_Lucene_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch_Tokenize

Elasticsearch未分析字段上的通配符搜索

search lucene

Elasticsearch未分析字段上的通配符搜索,search,lucene,elasticsearch,tokenize,Search,Lucene,elasticsearch,Tokenize,我有一个像以下设置和映射索引 { "settings":{ "index":{ "analysis":{ "analyzer":{ "analyzer_keyword":{ "tokenizer":"keyword", "filter":"lowercase" } } } }

我有一个像以下设置和映射索引

{
  "settings":{
     "index":{
        "analysis":{
           "analyzer":{
              "analyzer_keyword":{
                 "tokenizer":"keyword",
                 "filter":"lowercase"
              }
           }
        }
     }
  },
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "analyzer":"analyzer_keyword",
              "type":"string",
              "index": "not_analyzed"
           }
        }
     }
  }
}

我正在努力实现

name

字段上的通配符搜索。我的示例数据如下

[
{"name": "SVF-123"},
{"name": "SVF-234"}
]

当我执行以下查询时

http://localhost:9200/my_index/product/_search -d '
{
    "query": {
        "filtered" : {
            "query" : {
                "query_string" : {
                    "query": "*SVF-1*"
                }
            }
        }

    }
}'

它返回

SVF-123

，

SVF-234

。我认为，它仍然标记数据。它只能返回

SVF-123

你能帮个忙吗

提前谢谢

这里有几件事出了问题

首先，您是说您不希望在索引时间内对术语进行分析。然后，配置了一个分析器（用于搜索时间）来生成不兼容的术语。（小写）

默认情况下，所有术语都会在标准分析器的

\u all

-字段中结束。这就是你最终搜索的地方。因为它在“-”上标记，所以最终的OR值为“*SVF”和“1*”

试着做一个关于所有人和名字的术语方面，看看发生了什么

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {
            "text": [
                "SVF-123",
                "SVF-234"
            ],
            "analyzer": {
                "analyzer_keyword": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "type": {
            "properties": {
                "name": {
                    "type": "string",
                    "index": "not_analyzed",
                    "analyzer": "analyzer_keyword"
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-123"}
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-234"}
'

# Do searches

# See all the generated terms.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "facets": {
        "name": {
            "terms": {
                "field": "name"
            }
        },
        "_all": {
            "terms": {
                "field": "_all"
            }
        }
    }
}
'

# Analyzed, so no match
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "match": {
            "name": {
                "query": "SVF-123"
            }
        }
    }
}
'

# Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match)
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "term": {
            "name": {
                "value": "SVF-123"
            }
        }
    }
}
'


curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "term": {
            "_all": {
                "value": "svf"
            }
        }
    }
}
'

这里有一个可运行的剧本和要点：（）

您需要确保索引的术语与您搜索的内容兼容。您可能想禁用所有的

，因为它可能会混淆正在发生的事情
#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {
            "text": [
                "SVF-123",
                "SVF-234"
            ],
            "analyzer": {
                "analyzer_keyword": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "type": {
            "properties": {
                "name": {
                    "type": "string",
                    "index": "not_analyzed",
                    "analyzer": "analyzer_keyword"
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-123"}
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-234"}
'

# Do searches

# See all the generated terms.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "facets": {
        "name": {
            "terms": {
                "field": "name"
            }
        },
        "_all": {
            "terms": {
                "field": "_all"
            }
        }
    }
}
'

# Analyzed, so no match
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "match": {
            "name": {
                "query": "SVF-123"
            }
        }
    }
}
'

# Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match)
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "term": {
            "name": {
                "value": "SVF-123"
            }
        }
    }
}
'


curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "term": {
            "_all": {
                "value": "svf"
            }
        }
    }
}
'

我的解决方案冒险
正如你在我的问题中所看到的，我已经开始了我的案例。每当我更改了一部分设置时，一部分开始工作，但另一部分停止工作。让我给出我的解决方案历史记录：
1。）我已将数据索引为默认值。这意味着，默认情况下，我的数据已分析。这会给我带来麻烦。比如,
当用户开始搜索像SVF-1这样的关键字时，系统运行以下查询：
{
    "query": {
        "filtered" : {
            "query" : {
                "query_string" : {
                    "analyze_wildcard": true,
                    "query": "*SVF-1*"
                }
            }
        }

    }
}

和结果
SVF-123
SVF-234

这是正常的，因为我的文档的名称
字段已被分析
。这将查询拆分为标记SVF
和1
，并且SVF
匹配我的文档，尽管1
不匹配。我跳过了这条路。我已经为我的字段创建了一个映射，使它们不被分析

{
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "type":"string",
              "index": "not_analyzed"
           },
           "site":{
              "type":"string",
              "index": "not_analyzed"
           } 
        }
     }
  }
}

但我的问题还在继续
2。）经过大量研究，我想尝试另一种方法。决定使用。
我的问题是,
{
    "query": {
        "wildcard" : {
            "name" : {
                "value" : *SVF-1*"
             }
          }
      },
            "filter":{
                    "term": {"site":"pro_en_GB"}
            }
    }
}

{
    "query": {
        "wildcard" : {
            "nameLowerCase" : {
                "value" : "*svf-1*"
             }
          }
      },
            "filter":{
                    "term": {"site":"pro_en_GB"}
            }
    }
}

这个查询有效，但这里有一个问题。我的字段不再被分析，我正在进行通配符查询。区分大小写是这里的问题。如果我像svf-1那样搜索，它将不返回任何内容。因为，用户可以输入查询的小写版本
3。）我已将文档结构更改为
{
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "type":"string",
              "index": "not_analyzed"
           },
           "nameLowerCase":{
              "type":"string",
              "index": "not_analyzed"
           }
           "site":{
              "type":"string",
              "index": "not_analyzed"
           } 
        }
     }
  }
}

我为name
添加了一个名为namelowercise
的字段。当我为文档编制索引时，我将文档设置为：
{
    name: "SVF-123",
    nameLowerCase: "svf-123",
    site: "pro_en_GB"
}

在这里，我将查询关键字转换为小写，并对新的nameLowerCase
索引执行搜索操作。并显示名称
字段
我的查询的最终版本是
{
    "query": {
        "wildcard" : {
            "name" : {
                "value" : *SVF-1*"
             }
          }
      },
            "filter":{
                    "term": {"site":"pro_en_GB"}
            }
    }
}

{
    "query": {
        "wildcard" : {
            "nameLowerCase" : {
                "value" : "*svf-1*"
             }
          }
      },
            "filter":{
                    "term": {"site":"pro_en_GB"}
            }
    }
}

现在它起作用了。还有一种方法可以通过使用解决这个问题。我的查询包含破折号（-），并且遇到了一些问题
非常感谢@Alex Brasetvik的详细解释和努力
添加到Hüseyin答案中，我们可以使用and作为默认运算符。因此，SVF和1*将使用and运算符进行连接，从而给出正确的结果
"query": {
    "filtered" : {
        "query" : {
            "query_string" : {
                "default_operator": "AND",
                "analyze_wildcard": true,
                "query": "*SVF-1*"
            }
        }
    }
}

@Viduranga Wijesooriya，正如您所说的“默认\u运算符”：“和”

将检查SVF和1是否存在，但仍然无法单独精确匹配，但这将以更合适的方式过滤结果，保留SVF和1的所有组合，并按相关性对结果进行排序，这将提升SVF-1的顺序

为了得到准确的结果

"settings": {
        "analysis": {
            "analyzer": {
                "analyzer_keyword": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "type": {
            "properties": {
                "name": {
                    "type": "string",
                    "analyzer": "analyzer_keyword"
                }
            }
        }
    }

问题是

{
    "query": {
        "bool": {
            "must": [
               {
                    "query_string" : {
                        "fields": ["name"],
                        "query" : "*svf-1*",
                        "analyze_wildcard": true
                    }
               }
            ]
        }
    }
}

结果

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "play",
            "_type": "type",
            "_id": "AVfXzn3oIKphDu1OoMtF",
            "_score": 1,
            "_source": {
               "name": "SVF-123"
            }
         }
      ]
   }
}

仅供参考，你真的不想要前导通配符，我想如果你这样做，它会查看每个文档。我知道性能上的缺陷，但我需要执行通配符搜索，即使是SVF-*我认为尾随通配符也可以，你只是不想要前导通配符..当用户VF时，它应该返回SVF-。。。这就是我使用前导通配符的原因。我认为最好的做法是使用前导通配符，这样前导通配符就可以变成尾随通配符。首先，谢谢你的回答。我用过你的例子，但对我的案子不起作用。我的案子有点复杂。最后，我找到了一个解决方案，我将在回答中陈述整个情况。