EdgeNGramFilterFactory在solr中运行不正常
我正在solr.EdgeNGramFilterFactory中尝试solr.EdgeNGramFilterFactory 我在schema.xml的字段类型的索引分析器中添加了EdgeNGramFilterFactory在solr中运行不正常,solr,indexing,n-gram,search-suggestion,Solr,Indexing,N Gram,Search Suggestion,我正在solr.EdgeNGramFilterFactory中尝试solr.EdgeNGramFilterFactory 我在schema.xml的字段类型的索引分析器中添加了。 据我所知,solr.EdgeNGramFilterFactory创建令牌,例如 whether - wh, whe, whet, wheth, whethe, whether. 因此,当我搜索查询-是否时,它会给出包含是否单词的标记的所有文档 "suggestion":["wether","ether","heath
。
据我所知,solr.EdgeNGramFilterFactory创建令牌,例如
whether - wh, whe, whet, wheth, whethe, whether.
因此,当我搜索查询-是否时,它会给出包含是否单词的标记的所有文档
"suggestion":["wether","ether","heather","walther" "weather","wheeler", "fletcher", "shepherd","together","whenever","wherever","another","blather","bother","brother","chothe","eiher","either","farther","father""feather","further","gather","goethe","günther" "higher","hucher","leather","mother","neither","nyheter", "other","rather","whence", "where""shepherds","weathered","altogether","breathed","brothers","feathers","hitherto","northern","preacher","southern","withered"]
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">term</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory</str>
<str name="buildOnCommit">true</str>
<str name="queryAnalyzerFieldType">textSpell</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">term</str>
<str name="spellcheck">on</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">500</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
我只需要相关文档,如是否单词,我需要天气、天气、乙醚、石南
而不是不必要的文档如兄弟、牧羊人等
schema.xml:
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
solrconfig.xml:
"suggestion":["wether","ether","heather","walther" "weather","wheeler", "fletcher", "shepherd","together","whenever","wherever","another","blather","bother","brother","chothe","eiher","either","farther","father""feather","further","gather","goethe","günther" "higher","hucher","leather","mother","neither","nyheter", "other","rather","whence", "where""shepherds","weathered","altogether","breathed","brothers","feathers","hitherto","northern","preacher","southern","withered"]
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">term</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory</str>
<str name="buildOnCommit">true</str>
<str name="queryAnalyzerFieldType">textSpell</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">term</str>
<str name="spellcheck">on</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">500</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
违约
学期
文档字典工厂
org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory
真的
文字拼写
学期
在…上
违约
真的
500
建议
尝试将minGramSize设置为更大的值,例如4或5,以减少不相关匹配的数量。另外,查看文档了解更多详细信息尝试将minGramSize设置为更大的值,例如4或5,以减少不相关匹配的数量。另外,请查看文档以了解更多详细信息正如我在schema.xml文件中看到的,您使用的是NGramFilterFactory,而不是EdgeNGramFilterFactory。这意味着您的用户没有像您描述的那样创建令牌:
是否-wh,whe,wheth,wheth,whethe,whether
使用NGramFilterFactory时,您使用的令牌将如下所示:
无论-wh,whe,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,whether
根据您的用例,您还应该考虑在索引和查询时使用不同的令牌和过滤器。分析Solr在索引和查询时对数据所做的操作的一个好方法是使用分析工具
正如我在schema.xml文件中看到的,您使用的是NGramFilterFactory,而不是EdgeNGramFilterFactory。这意味着您的用户没有像您描述的那样创建令牌: 是否-wh,whe,wheth,wheth,whethe,whether 使用NGramFilterFactory时,您使用的令牌将如下所示: 无论-wh,whe,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,wheth,whether根据您的用例,您还应该考虑在索引和查询时使用不同的令牌和过滤器。在索引和查询时,分析Solr对数据做了什么的一个好方法是使用分析工具
- 第一件事:
class=“solr.SpellCheckComponent”
solr.EdgeNGramFilterFactory处理来自我们请求的查询的响应/结果
- 秒: MaxResultForSuggest=值。其中值可以是10(整数)
CorrectlySpelled=true
并且不会对查询给出建议,而如果查询的结果/响应小于MaxResultForSuggest然后它给出CorrectlySpelled=false
并对查询给出建议
<str name="spellcheck.maxResultsForSuggest">10</str>
10
- 第三名:
<str name="spellcheck.alternativeTermCount">20</str>
<str name="spellcheck.onlyMorePopular">false</str>
20
- 第四名:
<str name="spellcheck.alternativeTermCount">20</str>
<str name="spellcheck.onlyMorePopular">false</str>
false
在拼写正确的单词和拼写错误的单词上建立良好的建议
<str name="spellcheck.alternativeTermCount">20</str>
<str name="spellcheck.onlyMorePopular">false</str>
Schema.xml:
"suggestion":["wether","ether","heather","walther" "weather","wheeler", "fletcher", "shepherd","together","whenever","wherever","another","blather","bother","brother","chothe","eiher","either","farther","father""feather","further","gather","goethe","günther" "higher","hucher","leather","mother","neither","nyheter", "other","rather","whence", "where""shepherds","weathered","altogether","breathed","brothers","feathers","hitherto","northern","preacher","southern","withered"]
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">term</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookupFactory</str>
<str name="buildOnCommit">true</str>
<str name="queryAnalyzerFieldType">textSpell</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">term</str>
<str name="spellcheck">on</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">500</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Solrconfig.xml:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">term</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">0</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">2</int>
<float name="maxQueryFrequency">0.01</float>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">term</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.maxResultsForSuggest">10</str>
<str name="spellcheck.alternativeTermCount">30</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
文字拼写
违约
学期
solr.DirectSolrSpellChecker
内部的
0.5
2.
0
5.
2.
0.01
学期
违约
在…上
真的
10
10
30
拼写检查
- 第一件事:
class=“solr.SpellCheckComponent”
solr.EdgeNGramFilterFactorywo