Solr同义词替换失败?

Solr同义词替换失败?,solr,Solr,我有一个使用同义词文件的同义词过滤器工厂。从Solr文档中: #Explicit mappings match any token sequence on the LHS of "=>" #and replace with all alternatives on the RHS. These types of mappings #ignore the expand parameter in the schema. #Examples: i-pod, i pod => ipod, s

我有一个使用同义词文件的同义词过滤器工厂。从Solr文档中:

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit
但是,在查询
sea bitchie
时,我最终得到了与
sea
bitchie
seabackie
相关的结果

这就好像我有以下配置(使用
expand=“true”
):

我不理解这种行为,因为在Solr分析工具中,当查询
seabrickit
时,它只被
seabrickit
正确替换

换句话说:
=>
的显式同义词映射不起作用


编辑:字段配置 标记化:
true

类名:
org.apache.solr.schema.TextField

索引分析器:
org.apache.solr.analysis.TokenizerChain

  • 标记器类:
    org.apache.solr.analysis.WhitespaceTokenizerFactory
过滤器:

org.apache.solr.analysis.StopFilterFactory args:{enablePositionIncrements: true words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 1 catenateNumbers: 1 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SynonymFilterFactory args:{expand: true ignoreCase: true synonyms: synonyms.txt }
org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 0 catenateNumbers: 0 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}
查询分析器:
org.apache.solr.analysis.TokenizerChain

  • 标记器类:
    org.apache.solr.analysis.WhitespaceTokenizerFactory
过滤器:

org.apache.solr.analysis.StopFilterFactory args:{enablePositionIncrements: true words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 1 catenateNumbers: 1 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SynonymFilterFactory args:{expand: true ignoreCase: true synonyms: synonyms.txt }
org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 0 catenateNumbers: 0 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}

您正在进行短语查询(使用双引号)? 如果没有,您将向同义词过滤器提供两个不同的标记(sea和bitchie)。在这种情况下,找不到匹配的同义词


顺便说一下,在索引时处理同义词几乎总是一个更好的主意。请看这里:

同义词过滤器工厂已被弃用,现在应替换为。当同一位置存在多个令牌时,它会挤压令牌并修复多单词同义词的问题。

您可以发布字段配置吗?@rohk Done,我发布了我使用的字段类型的配置我没有使用双引号,但是,如果有,例如,
这个巧克力,那个巧克力
(没有
=>
)。。那么为什么在这里不起作用呢?关于同义词的索引时间用法,我在前面读过,你是对的,但我(不幸的)现在不能更改它)。