我可以清除lucene.net中的stopword列表以使精确匹配更好地工作吗？_Lucene_Lucene.net

我可以清除lucene.net中的stopword列表以使精确匹配更好地工作吗？

lucene

我可以清除lucene.net中的stopword列表以使精确匹配更好地工作吗？,lucene,lucene.net,Lucene,Lucene.net,在处理精确匹配时，我会收到一个真实世界的查询，如下所示：不在教育、就业或培训中转换为移除stopwords的Lucene查询提供： +Content:"? ? education employment ? training" +Content:"? ? ? ? thing" 这里有一个更人为的例子：没有这样的事转换为移除stopwords的Lucene查询提供： +Content:"? ? education employment ? training" +Content:"

在处理精确匹配时，我会收到一个真实世界的查询，如下所示：

不在教育、就业或培训中

转换为移除stopwords的Lucene查询提供：

+Content:"? ? education employment ? training"

+Content:"? ? ? ? thing"

这里有一个更人为的例子：

没有这样的事

转换为移除stopwords的Lucene查询提供：

+Content:"? ? education employment ? training"

+Content:"? ? ? ? thing"

我的目标是让像这样的搜索只匹配用户输入时的精确匹配

一种解决办法是清除“禁止使用的词语”列表吗？这会有负面影响吗？如果是，什么？我的google fu失败了

这一切都取决于您使用的分析仪。StandardAnalyzer使用停止字并将其删除，事实上，StopAnalyzer是StandardAnalyzer获取停止字的来源

使用WhitespaceAnalyzer，或者通过继承最适合您需要的分析工具来创建自己的分析工具，并根据需要对其进行修改

或者，如果您喜欢StandardAnalyzer，您可以使用自定义的停止词列表创建一个新的：

//This is what the default stop word list is in case you want to use or filter this
var defaultStopWords = StopAnalyzer.ENGLISH_STOP_WORDS_SET;

//create a new StandardAnalyzer with custom stop words
var sa = new StandardAnalyzer(
    Version.LUCENE_29, //depends on your version
    new HashSet<string> //pass in your own stop word list
    {
        "hello",
        "world"
    });