在Lucene中搜索两个字母的单词_Lucene_Lucene.net

在Lucene中搜索两个字母的单词

lucene

在Lucene中搜索两个字母的单词,lucene,lucene.net,Lucene,Lucene.net,我正试图找到包含首字母缩略词“IT”的文档我尝试过使用StandardAnalyzer、SimpleAnalyzer和KeywordAnalyzer进行搜索-结果相同（没有任何点击）就我所见，“it”不是默认停止词的一部分我可以使用通配符搜索找到文档，因此我知道它们在索引中非常感谢您的帮助！干杯我尝试在没有任何停止词的情况下重新编制索引 new IndexWriter(directory, new StandardAnalyzer(Version.LUC

我正试图找到包含首字母缩略词“IT”的文档

我尝试过使用StandardAnalyzer、SimpleAnalyzer和KeywordAnalyzer进行搜索-结果相同（没有任何点击）

就我所见，“it”不是默认停止词的一部分

我可以使用通配符搜索找到文档，因此我知道它们在索引中

非常感谢您的帮助！干杯

我尝试在没有任何停止词的情况下重新编制索引

new IndexWriter(directory,
                new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>()), // No stop words
                true,
                IndexWriter.MaxFieldLength.UNLIMITED);

newindexwriter（目录，
新的StandardAnalyzer（Version.LUCENE_30，new HashSet（）），//无停止字
是的，
IndexWriter.MaxFieldLength.UNLIMITED）；

…在这之后，我可以搜索“it”，只要我使用相同类型的分析器（没有任何停止词）进行搜索：

new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>()

new StandardAnalyzer（Version.LUCENE_30，new HashSet（）

我尝试在没有任何停止词的情况下重新编制索引

new IndexWriter(directory,
                new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>()), // No stop words
                true,
                IndexWriter.MaxFieldLength.UNLIMITED);

newindexwriter（目录，
新的StandardAnalyzer（Version.LUCENE_30，new HashSet（）），//无停止字
是的，
IndexWriter.MaxFieldLength.UNLIMITED）；

…在这之后，我可以搜索“it”，只要我使用相同类型的分析器（没有任何停止词）进行搜索：

new StandardAnalyzer(Version.LUCENE_30, new HashSet<string>()

new StandardAnalyzer（Version.LUCENE_30，new HashSet（）

默认的stopword集合包含单词“it”。它在

StopAnalyzer

中定义，它是：

final List<String> stopWords = Arrays.asList(
   "a", "an", "and", "are", "as", "at", "be", "but", "by",
   "for", "if", "in", "into", "is", "it",
   "no", "not", "of", "on", "or", "such",
   "that", "the", "their", "then", "there", "these",
   "they", "this", "to", "was", "will", "with"
 );

final List stopWords=Arrays.asList(
“a”、“an”、“and”、“are”、“as”、“at”、“be”、“but”、“by”，
“for”、“if”、“in”、“into”、“is”、“it”，
“不”、“不”、“属于”、“关于”、“或”、“诸如此类”，
“那个”，“那个”，“他们的”，“然后”，“那里”，“这些”，
“他们”、“这个”、“到”、“过去”、“将来”、“与”
);

无论是

SimpleAnalyzer

还是

KeywordAnalyzer

都不使用停止字，因此由于其他一些问题，可能是对它们如何标记的误解，或者索引和查询时间分析器之间的分歧，这些停止字都不起作用。

默认的停止字集确实包含“it”一词。它在

StopAnalyzer

中定义，它是：

final List<String> stopWords = Arrays.asList(
   "a", "an", "and", "are", "as", "at", "be", "but", "by",
   "for", "if", "in", "into", "is", "it",
   "no", "not", "of", "on", "or", "such",
   "that", "the", "their", "then", "there", "these",
   "they", "this", "to", "was", "will", "with"
 );

final List stopWords=Arrays.asList(
“a”、“an”、“and”、“are”、“as”、“at”、“be”、“but”、“by”，
“for”、“if”、“in”、“into”、“is”、“it”，
“不”、“不”、“属于”、“关于”、“或”、“诸如此类”，
“那个”，“那个”，“他们的”，“然后”，“那里”，“这些”，
“他们”、“这个”、“到”、“过去”、“将来”、“与”
);

SimpleAnalyzer

和

KeywordAnalyzer

都没有使用stopwords，因此由于其他一些问题，可能是对它们如何标记的误解，或者索引和查询时间分析器之间存在分歧，这些都不起作用。

奇怪，我没有看到“it”在Lucene 3.0中的停止词中。可能只是错过了。谢谢！奇怪的是，我在Lucene 3.0中的停止词中没有看到“it”。可能只是错过了。谢谢！