Java Lucene:generateWordParts vs splitOnCaseChange
我正在调查WordDelimiterFilterFactory 我混淆了Java Lucene:generateWordParts vs splitOnCaseChange,java,solr,lucene,full-text-search,hibernate-search,Java,Solr,Lucene,Full Text Search,Hibernate Search,我正在调查WordDelimiterFilterFactory 我混淆了generateWordParts和splitOnCaseChange参数 来自java文档: 生成零件: /** * Causes parts of words to be generated: * <p> * "PowerShot" => "Power" "Shot" */ public static final int GENERATE_WORD_PARTS = 1
generateWordParts
和splitOnCaseChange
参数
来自java文档:生成零件:
/**
* Causes parts of words to be generated:
* <p>
* "PowerShot" => "Power" "Shot"
*/
public static final int GENERATE_WORD_PARTS = 1;
你能举例说明区别吗
附笔。
另外,我不理解子单词的菜单用法:WordDelimiterFilter已经被worddelimiter图形过滤器取代(它可以很好地处理短语查询) generateWordParts考虑的不仅仅是案例差异。即,
foo-bar
分为foo
和bar
。在这里,按大小写更改拆分不会起任何作用,只留下一个foo-bar
token
子单词_DELIM引用types
参数,您可以在其中包含一个文件,该文件定义可以假定哪些字符将令牌拆分为子单词:
types
(optional) The pathname of a file that contains character => type mappings, which enable customization of this filter’s splitting behavior. Recognized character types: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, and SUBWORD_DELIM.
我假设您可以使用子单词_DELIM字符作为“|”或“.”,如果一个单词包含这两个字符中的任何一个,您可以将其拆分为两个标记。能否为单词分隔符图形过滤器提供类名?我找不到它,Solr的工厂以通常的方式命名,
Solr.WordDelimiterGraphFilterFactory
。看起来WordDelimiterGraphFilterFactory在hibernate Search中不可用。它的行为与WordDelimiterFilter基本相同,但对短语查询有更好的/实际支持@你真的认为java文档很清晰吗?不过,除了java文件中常量的文档之外,还有更多的文档。来自文档:generateWordParts:(整数,默认值1)如果非零,则在分隔符处拆分单词。例如:“CamelCase”,“hot spot”->“Camel”,“Case”,“hot”,“spot”
types
(optional) The pathname of a file that contains character => type mappings, which enable customization of this filter’s splitting behavior. Recognized character types: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, and SUBWORD_DELIM.