如何在Lucene 4.2的每个文档中获取术语频率
最初我使用Lucene 3.2获取术语和术语频率,代码如下:如何在Lucene 4.2的每个文档中获取术语频率,lucene,Lucene,最初我使用Lucene 3.2获取术语和术语频率,代码如下: for(int docNum=0; docNum < ir.numDocs(); docNum++) { TermFreqVector tfv = ir.getTermFreqVector(docNum, "TERJEMAHAN"); if (tfv == null) { // ignore empty fields continue; } String terms[] = tfv.getTerms(); int
for(int docNum=0; docNum < ir.numDocs(); docNum++) {
TermFreqVector tfv = ir.getTermFreqVector(docNum, "TERJEMAHAN");
if (tfv == null) {
// ignore empty fields
continue;
}
String terms[] = tfv.getTerms();
int termCount = terms.length;
int freqs[] = tfv.getTermFrequencies();
for(int t=0; t < termCount; t++) {
int freqn = ir.docFreq(new Term("TERJEMAHAN", terms[t]));
}
}
for(int-docNum=0;docNum
如何获得Lucene 4.2中每个文档的术语频率?我设法使用以下行计算术语频率:
Term term = ...;
IndexReader reader = ...;
DocsEnum docEnum = MultiFields.getTermDocsEnum(reader, MultiFields.getLiveDocs(reader), "contents", term.bytes());
int termFreq = 0;
int doc = DocsEnum.NO_MORE_DOCS;
while ((doc = docEnum.nextDoc()) != DocsEnum.NO_MORE_DOCS) {
termFreq += docEnum.freq();
}