在JavaLucene中查找每个术语在每个文档中重复了多少次?
我已经索引了大约一千个lucene文档,我想检索所有文档中所有术语的每个文档的术语频率,下面是我如何索引的在JavaLucene中查找每个术语在每个文档中重复了多少次?,java,lucene,Java,Lucene,我已经索引了大约一千个lucene文档,我想检索所有文档中所有术语的每个文档的术语频率,下面是我如何索引的 HashMap<Integer, String> documentList = getEachDocumentSeparated(); Analyzer analyzer = new StandardAnalyzer(); Directory index = FSDirectory.open(Paths.get(RESULT_AD
HashMap<Integer, String> documentList = getEachDocumentSeparated();
Analyzer analyzer = new StandardAnalyzer();
Directory index = FSDirectory.open(Paths.get(RESULT_ADDRESS));
IndexWriterConfig config = new IndexWriterConfig(analyzer);
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
IndexWriter w = new IndexWriter(index, config);
FieldType fieldType = new FieldType((TextField.TYPE_STORED));
IndexOptions indexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
fieldType.setIndexOptions(indexOptions);
for (Map.Entry<Integer, String> pair : documentList.entrySet())
{
Document doc = new Document();
Field bodyField = new Field("body", pair.getValue(), fieldType);
doc.add(new StringField("id", pair.getKey(), Field.Store.YES));
doc.add(bodyField);
w.addDocument(doc);
}
HashMap documentList=getEachDocumentSeparated();
Analyzer Analyzer=新的StandardAnalyzer();
目录索引=FSDirectory.open(path.get(RESULT_ADDRESS));
IndexWriterConfig配置=新的IndexWriterConfig(分析器);
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
IndexWriter w=新的IndexWriter(索引,配置);
FieldType FieldType=新的FieldType((TextField.TYPE_存储));
IndexOptions IndexOptions=IndexOptions.DOCS_和_freques_和_POSITIONS_和_offset;
fieldType.setIndexOptions(indexOptions);
对于(Map.Entry对:documentList.entrySet())
{
单据单据=新单据();
Field bodyField=新字段(“body”,pair.getValue(),fieldType);
添加(新的StringField(“id”,pair.getKey(),Field.Store.YES));
单据新增(bodyField);
w、 添加文档(doc);
}
我想得到一个向量,比如下面的向量
斯特姆,1(5),2(10),330(2),500(1),1001(3)
这意味着文档一中的
sterm
已经重复了5次,文档2中的sterm已经重复了10次,以此类推……应该是您所看到的for@injecteer首先,它需要给出一个术语,其次我想知道每个文档的频率,并非所有文档的频率都应该与您的外观相同for@injecteer首先,它需要给出一个术语,其次,我想知道每个文档的频率,而不是所有文档的频率