Autocomplete Elasticsearch：查找子字符串匹配_Autocomplete_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch_Substring_Stringtokenizer_N Gram

Autocomplete Elasticsearch：查找子字符串匹配

autocomplete

Autocomplete Elasticsearch：查找子字符串匹配,autocomplete,elasticsearch,substring,stringtokenizer,n-gram,Autocomplete,elasticsearch,Substring,Stringtokenizer,N Gram,我想执行精确的单词匹配和部分单词/子字符串匹配。例如，如果我搜索“男士剃须刀”，那么我应该能够在结果中找到“男士剃须刀”。但如果我搜索“en的剃须刀”，那么我也应该能够在结果中找到“男士剃须刀”。我正在使用以下设置和映射：索引设置： PUT /my_index { "settings": { "number_of_shards": 1, "analysis": { "filter": { "au

我想执行精确的单词匹配和部分单词/子字符串匹配。例如，如果我搜索“男士剃须刀”，那么我应该能够在结果中找到“男士剃须刀”。但如果我搜索“en的剃须刀”，那么我也应该能够在结果中找到“男士剃须刀”。我正在使用以下设置和映射：

索引设置：

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

PUT /my_index/my_type/_mapping
{
    "my_type": {
        "properties": {
            "name": {
                "type":            "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

"custom_analyzer": {
            "tokenizer": "keyword",
            "filter": ["lowercase"]
        }

映射：

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

PUT /my_index/my_type/_mapping
{
    "my_type": {
        "properties": {
            "name": {
                "type":            "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

"custom_analyzer": {
            "tokenizer": "keyword",
            "filter": ["lowercase"]
        }

插入记录：

POST /my_index/my_type/_bulk
{ "index": { "_id": 1            }}
{ "name": "men's shaver" }
{ "index": { "_id": 2            }}
{ "name": "women's shaver" }

查询：

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

PUT /my_index/my_type/_mapping
{
    "my_type": {
        "properties": {
            "name": {
                "type":            "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

"custom_analyzer": {
            "tokenizer": "keyword",
            "filter": ["lowercase"]
        }

1。按精确短语匹配搜索-->“男式”

上述查询在返回结果中返回“men's shaver”

2。要通过部分单词匹配进行搜索-->“en's”

上面的查询不返回任何内容

我还尝试了以下查询

POST /my_index/my_type/_search
{
    "query": {
        "wildcard": {
           "name": {
              "value": "%en's%"
           }
        }
    }
}

还是一无所获。我想这是因为索引上的“边缘”类型过滤器无法找到“部分单词/模板匹配”。我也尝试了“n-gram”类型的过滤器，但它大大减慢了搜索速度

请建议我如何使用相同的索引设置实现exact短语匹配和部分短语匹配。

要搜索部分字段匹配和精确匹配，如果将字段定义为“未分析”或关键字（而不是文本），然后使用通配符查询，效果会更好

要使用通配符查询，请在要搜索的字符串两端追加*：

POST /my_index/my_type/_search
{
"query": {
    "wildcard": {
       "name": {
          "value": "*en's*"
       }
    }
}
}

要与不区分大小写一起使用，请使用带有小写筛选器和关键字标记器的自定义分析器

自定义分析器：

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

PUT /my_index/my_type/_mapping
{
    "my_type": {
        "properties": {
            "name": {
                "type":            "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

"custom_analyzer": {
            "tokenizer": "keyword",
            "filter": ["lowercase"]
        }

使搜索字符串小写

如果将搜索字符串设置为AsD：通过使用任何字符串或子字符串进行搜索，将其更改为*AsD*

：

query: {
    or: [{
      match_phrase_prefix: {
            name: str
     }
    }, {
        match_phrase_prefix: {
            surname: str
        }
    }]
}

快乐的弹性搜索编码

@BlackPOP给出的答案是可行的，但它使用了通配符方法，这不是首选方法，因为它存在性能问题，如果被滥用，可能会在弹性集群中产生巨大的多米诺效应（性能问题）

我已经写了一篇关于部分搜索/自动完成的详细文章，介绍了Elasticsearch截至今天（2020年12月）的最新选项，并考虑了性能。有关更多权衡信息，请参阅答案

IMHO更好的方法是使用定制的按用例，它已经有搜索词所需的标记，因此速度更快，尽管它的索引大小更大，但您的大小并没有那么昂贵，并且通过对子字符串搜索工作方式的更多控制，速度会更好

此外，如果您在标记器设置中保守地定义最小和最大克数，则可以控制大小。

谢谢。我现在可以搜索了。只需引用ElasticSearch的文档：“警告：允许在单词开头使用通配符（例如“*ing”）特别重，因为索引中的所有术语都需要检查”谢谢提醒@大卫_p@david_p的链接已断开，但正如他所说，ElasticSearch建议“避免使用以通配符开头的模式（例如，*foo或，作为regexp，*foo）”。它不区分大小写。我们怎么能用它来区分大小写呢？不过他不想匹配前缀。