elasticsearch Sankt和St的同义词
我正在尝试让同义词在我现有的设置中工作。目前我有以下设置:
elasticsearch Sankt和St的同义词,
elasticsearch,
elasticsearch-5,
elasticsearch,
elasticsearch 5,我正在尝试让同义词在我现有的设置中工作。目前我有以下设置: PUT city { "settings": { "analysis": { "analyzer": { "autocomplete": { "tokenizer": "autocomplete", "filter": [ "low
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
},
"autocomplete_search": {
"tokenizer": "lowercase",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
}
},
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15,
"token_chars": [
"letter",
"digit",
"symbol"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
在这个城市索引中,我有这样的文档:
圣沃尔夫冈或圣沃尔夫冈等。对我来说,圣与圣是同义词。因此,如果我搜索Sankt,两个文档都应该出现
我创建了一个新过滤器,并将该过滤器添加到我的自动完成分析器中:
现在很好。但我面临的问题如下:
很明显,st后面的点目前没有被分析和搜索。但对于同义词,点很重要
第二个问题是,如果我搜索sankt,同义词是st,它会给我所有以st开头的文档,比如斯图加特。这也是因为没有使用圆点
你知道我怎样才能做到吗?如果你需要更多的信息,请告诉我
更新:
讨论后,我在我的设置中进行了以下更改:
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
},
"autocomplete_search": {
"tokenizer": "lowercase",
"filter": [
"lowercase",
"my_synonym_filter",
"german_normalization",
"my_ascii_folding"
]
}
},
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 15,
"token_chars": [
"letter",
"digit",
"symbol"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
将边缘内存标记器更改为标准标记器
添加了Edengram过滤器,并将此过滤器添加到我的分析器中
从我的分析器中删除了过滤器德语标准化和我的ascii折叠,以简化测试
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase",
"my_synonym_filter",
"edge_filter"
]
},
"autocomplete_search": {
"tokenizer": "autocomplete",
"filter": [
"my_synonym_filter",
"lowercase"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st => sankt"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "standard"
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
我将这3个文档添加到索引中:
"name":"Sankt Wolfgang",
"name":"Stuttgart",
"name":"St. Wolfgang"
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter",
"edge_filter"
]
},
"autocomplete_search": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st. => sankt"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
查询字符串-结果
st -> "St. Wolfgang", "Stuttgart"
st. -> "St. Wolfgang", "Sankt Wolfgang"
sankt -> "St. Wolfgang", "Sankt Wolfgang"
这对我来说很有效。这里的要点是确保 将同义词筛选器放在小写筛选器之后 将edge-n-gram过滤器放在末端 仅在索引时使用edge-n-gram 因此,我们创建索引:
"name":"Sankt Wolfgang",
"name":"Stuttgart",
"name":"St. Wolfgang"
PUT city
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter",
"edge_filter"
]
},
"autocomplete_search": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
},
"my_synonym_filter": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"sankt, st. => sankt"
]
}
}
}
},
"mappings": {
"city": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
}
然后我们索引数据:
PUT city/city/1
{
"name":"St. Wolfgang"
}
PUT city/city/2
{
"name":"Stuttgart"
}
PUT city/city/3
{
"name":"Sankt Wolfgang"
}
最后,搜索st或sankt将只返回文档1和3,而不是2
POST city/_search?q=name:st
POST city/_search?q=name:sankt
您是否可以尝试更改sankt的同义词,st=>sankt,即st将被索引为sankt,因此搜索sankt将返回sankt,搜索st也应仅匹配sankt。你能试一试吗?@Val我不能通过更改同义词来获取任何文档。真奇怪。你还有什么其他办法可以让它工作吗?你能用同义词标记过滤器更新你的设置吗?这样我就可以复制它了?哦,你实际上还需要在搜索时间分析器中添加同义词标记过滤器,这样键入st的人也可以在引擎盖下搜索sankt。@Val我编辑了设置。这正是我所用的。是的,我在搜索分析器中添加了同义词过滤器。如果您还需要索引或其他任何数据,请让我知道。谢谢你的回答。我复制了你的索引。但对我来说,当我将同义词改为sankt时,sankt只给出doc1和doc3,st=>sankt不带点。而对于查询,圣斯图加特也应该是一个匹配。对于圣斯图加特来说,这已经不是一个选择了。让我们把这个颠倒过来。你能为每个搜索输入制定出你想要的输出吗?当然。我用查询字符串和结果更新了我的问题。你能用标记化的空白代替标准的空白吗?太棒了,很高兴我们找到了答案!