elasticsearch 如何使用多重匹配的ngram分析仪
我有ngram_分析仪
elasticsearch 如何使用多重匹配的ngram分析仪,
elasticsearch,
elasticsearch,我有ngram_分析仪 "analysis": { "tokenizer": { "ngram_tokenizer": { "type": "ngram", "min_gram": 2, "max_gram": 10, "token_chars": [] } }, "analyzer": { "ngram_analyzer": { "type": "cu
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": []
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase",
]
}
}
}
并尝试搜索所有字段:
"query": {
"multi_match" : {
"query": "jan teach",
"analyzer": "ngram_analyzer",
"operator": "and",
"type": "cross_fields",
"fields": [ "name", "occupation", "surname", ... ]
}
}
此不幸事件不会返回任何结果
希望此项与name=“Jane”、accountry=“teacher”匹配
还是有更好的方法来实现这一点 首先,您需要的不是ngram标记器(因为它创建了更多的标记,所以索引空间很昂贵),因为您正在对标记进行前缀搜索(Jan in Jane和tech in teacher) 其次,使用搜索时间,您应该使用标准分析器,因为令牌(jan和teacher)已经存在 工作示例: 索引定义
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"edgengram_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "edgeNGramTokenizer"
}
},
"tokenizer": {
"edgeNGramTokenizer": {
"token_chars": [
"letter",
"digit"
],
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "10"
}
}
},
"max_ngram_diff": "10"
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer" : "edgengram_analyzer",
"search_analyzer" : "standard"
},
"occupation" :{
"type" : "text",
"analyzer" : "edgengram_analyzer",
"search_analyzer" : "standard"
}
}
}
}
索引样本文档
{
"name" : "Jane",
"occupation" : "teacher"
}
为Jane
POST yourindexname/_analyze
{
"text" : "Jane",
"analyzer": "edgengram_analyzer"
}
{
"tokens": [
{
"token": "ja",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "jan",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "jane",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 2
}
]
}
搜索查询与您的查询相同(但不带分析器)
和搜索结果
"hits": [
{
"_index": "ngram",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"name": "Jane",
"occupation": "teacher"
}
}
]
首先,您需要的不是ngram标记器(因为它创建了更多的标记,所以索引空间很昂贵),因为您正在对标记进行前缀搜索(Jan in Jane和tech in teacher) 其次,使用搜索时间,您应该使用标准分析器,因为令牌(jan和teacher)已经存在 工作示例: 索引定义
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"edgengram_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "edgeNGramTokenizer"
}
},
"tokenizer": {
"edgeNGramTokenizer": {
"token_chars": [
"letter",
"digit"
],
"min_gram": "2",
"type": "edgeNGram",
"max_gram": "10"
}
}
},
"max_ngram_diff": "10"
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer" : "edgengram_analyzer",
"search_analyzer" : "standard"
},
"occupation" :{
"type" : "text",
"analyzer" : "edgengram_analyzer",
"search_analyzer" : "standard"
}
}
}
}
索引样本文档
{
"name" : "Jane",
"occupation" : "teacher"
}
为Jane
POST yourindexname/_analyze
{
"text" : "Jane",
"analyzer": "edgengram_analyzer"
}
{
"tokens": [
{
"token": "ja",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "jan",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "jane",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 2
}
]
}
搜索查询与您的查询相同(但不带分析器)
和搜索结果
"hits": [
{
"_index": "ngram",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"name": "Jane",
"occupation": "teacher"
}
}
]
我得到一个错误:“NGram标记器中的最大值和最小值之间的差异必须小于或等于:[1],但为[8]。可以通过更改[index.max\u NGram\u diff]索引级别设置来设置此限制。”在es v7.6.2上运行此设置时,可能是根分析器在出现此错误的情况下仍能工作。您可能需要将setting index.max\u ngram\u diff设置为10您如何在索引时分析字段
name
和occulation
?他们是否得到了ngram\U分析仪?未在索引时进行分析。搜索时执行此操作您需要在索引时使用ngram_analyzer
分析这些字段,如果您在“属性”中不明确,则将处理为关键字和文本
(标准分析器),因此没有可匹配的toke“jan”。FWIWcross_fields
仅当字段共享同一个分析器时才将字段组合在一起。我得到错误:“NGram标记器中的最大值和最小值之间的差异必须小于或等于:[1]但为[8]。可以通过更改[index.max_NGram_diff]索引级别设置来设置此限制。”在es v7.6.2上运行此设置时,也许这就是根?分析器即使在出现此错误的情况下也能工作。您可能需要将setting index.max\u ngram\u diff设置为10您如何在索引时分析字段name
和occulation
?他们是否得到了ngram\U分析仪
?未在索引时进行分析。搜索时执行此操作您需要在索引时使用ngram_analyzer
分析这些字段,如果您在“属性”中不明确,则将处理为关键字和文本
(标准分析器),因此没有可匹配的toke“jan”。FWIWcross_字段
仅当字段共享同一个分析器时才将它们组合在一起