elasticsearch 同义词过滤器的不同位置增量行为
我的用例是搜索带有同义词的elasticsearch 同义词过滤器的不同位置增量行为,elasticsearch,search,lucene,elasticsearch,Search,Lucene,我的用例是搜索带有同义词的edge\ngram支持,其中要匹配的标记应按顺序排列 在尝试分析时,我观察到过滤器链在位置增量方面的两种不同行为 当筛选链为小写时,同义词由于同义词筛选 过滤器链为小写、边缘、同义词时,由于同义词过滤器 以下是我针对每个案例运行的查询: 案例1。无职位增加 PUT synonym_test { "index": { "analysis": { "analyzer": {
edge\ngram
支持,其中要匹配的标记应按顺序排列
在尝试分析时,我观察到过滤器链在位置增量方面的两种不同行为
小写时,同义词
由于同义词筛选
小写、边缘、同义词
时,由于同义词过滤器
PUT synonym_test
{
"index": {
"analysis": {
"analyzer": {
"by_smart": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"custom_synonym"
]
}
},
"filter": {
"custom_synonym": {
"type": "synonym",
"synonyms": [
"begin => start"
]
}
}
}
}
}
GET synonym_test/_analyze
{
"text": "begin working",
"analyzer": "by_smart"
}
PUT synonym_test
{
"index": {
"analysis": {
"analyzer": {
"by_smart": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"custom_edge_ngram",
"custom_synonym"
]
}
},
"filter": {
"custom_synonym": {
"type": "synonym",
"synonyms": [
"begin => start"
]
},
"custom_edge_ngram": {
"type": "edge_ngram",
"min_gram": "2",
"max_gram": "60"
}
}
}
}
}
GET synonym_test/_analyze
{
"text": "begin working",
"analyzer": "by_smart"
}
产出:
案例2。职位增量
PUT synonym_test
{
"index": {
"analysis": {
"analyzer": {
"by_smart": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"custom_synonym"
]
}
},
"filter": {
"custom_synonym": {
"type": "synonym",
"synonyms": [
"begin => start"
]
}
}
}
}
}
GET synonym_test/_analyze
{
"text": "begin working",
"analyzer": "by_smart"
}
PUT synonym_test
{
"index": {
"analysis": {
"analyzer": {
"by_smart": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"custom_edge_ngram",
"custom_synonym"
]
}
},
"filter": {
"custom_synonym": {
"type": "synonym",
"synonyms": [
"begin => start"
]
},
"custom_edge_ngram": {
"type": "edge_ngram",
"min_gram": "2",
"max_gram": "60"
}
}
}
}
}
GET synonym_test/_analyze
{
"text": "begin working",
"analyzer": "by_smart"
}
产出:
请注意,在案例1中,标记begin
和start
在被替换时具有相同的位置,并且没有位置增量。然而,在情况2中,当start
令牌被start
替换时,后续令牌流的位置增加
下面是我的问题:
begi-wor
与match_-phrase
查询(默认slop
为0
)时,它与开始工作
不匹配。
这是因为begi
和wor
相隔两个位置。关于如何在不影响用例的情况下实现这种行为,有什么建议吗5.6.8
的lucene版本6.6.1
我已经阅读了一些文档链接和文章,但我找不到任何合适的链接来解释为什么会发生这种情况,是否有一些设置来实现我想要的行为