在Lucene中使用哪个术语向量选项？_Lucene_Indexing_Lucene.net

在Lucene中使用哪个术语向量选项？

lucene indexing

在Lucene中使用哪个术语向量选项？,lucene,indexing,lucene.net,Lucene,Indexing,Lucene.net,我在Lucene中编制索引，只对从Lucene中获取相关文档的ID感兴趣（即，不是字段值或任何突出显示的信息）。鉴于这些要求，在不影响搜索性能（速度）或质量（结果）的情况下，我应该使用哪个术语向量？我也会使用更多类似的东西，所以我不想 TermVector.YES—Records the unique terms that occurred, and their counts, in each document, but doesn’t store any positions or offset

我在Lucene中编制索引，只对从Lucene中获取相关文档的ID感兴趣（即，不是字段值或任何突出显示的信息）。鉴于这些要求，在不影响搜索性能（速度）或质量（结果）的情况下，我应该使用哪个术语向量？我也会使用更多类似的东西，所以我不想

TermVector.YES—Records the unique terms that occurred, and their counts, in each document, but doesn’t store any positions or offsets information

TermVector.WITH_POSITIONS—Records the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets

TermVector.WITH_OFFSETS—Records the unique terms and their counts, with the offsets (start and end character position) of each occurrence of every term, but no positions

TermVector.WITH_POSITIONS_OFFSETS—Stores unique terms and their counts, along with positions and offsets

谢谢。

这取决于您的查询类型……如果您的ID中有任何相关数据，那么您将希望获得职位和/或报价

如果您有这样的文档： “诸如此类诸如此类日期诸如此类ID诸如此类姓名诸如此类”

你只需要找到那个特定的ID，然后TermVector是就可以了。但是，如果希望根据ID与日期或名称的接近程度（使用高级查询）查找ID，则需要附加的术语“位置”

您可以随时尝试这一点，这是一个简单的更改，假设您不必对10亿条记录索引或其他内容进行单元测试：）

顺便说一句，请查看我们的“Lucene in Action”，这本书涵盖了所有这些信息。

您想要内部Lucene文档编号或存储在其中的某个Id吗？