Java 查找Lucene中每个实体的最后一个事件_Java_Lucene

Java 查找Lucene中每个实体的最后一个事件

java lucene

Java 查找Lucene中每个实体的最后一个事件,java,lucene,Java,Lucene,因此，我将事件（文档）存储在Lucene文档库（版本6.2.1）中。每个文档都有一个EntityId和一个时间戳可以有许多具有相同EntityId的文档我想检索每个EntityId都有最新时间戳的文档我是否必须取出每个事件，并用Java执行此操作？我看过faceting，但据我所知，它只是用于计数，而不是用于max/min类型的聚合或者，您也可以通过实现同样的效果，您尝试做的事情可以通过来自工件的可用资源来完成 GroupingSearch将根据提供的组字段（EntityId在本例中）对

因此，我将事件（文档）存储在Lucene文档库（版本6.2.1）中。每个文档都有一个

EntityId

和一个

时间戳

可以有许多具有相同EntityId的文档

我想检索每个

EntityId

都有最新

时间戳的文档
我是否必须取出每个事件，并用Java执行此操作？我看过faceting，但据我所知，它只是用于计数，而不是用于max/min类型的聚合
或者，您也可以通过
实现同样的效果，您尝试做的事情可以通过来自工件的可用资源来完成
GroupingSearch
将根据提供的组字段（EntityId
在本例中）对您的文档进行分组，这些字段必须进行排序，否则在搜索时您将得到下一种类型的错误：
java.lang.IllegalStateException:类型NONE的意外docvalues
字段“${field name}”（应为已排序）
然后，为了能够获得给定EntityId
的最新文档，还需要对字段时间戳进行排序
例如，如果我将文档索引为next：
String id = ..
long timestamp = ...
Document doc = new Document();
// The sorted version of my EntityId
doc.add(new SortedDocValuesField("EntityId", new BytesRef(id)));
// The stored version of my EntityId to be able to get its value later if needed
doc.add(new StringField("Id", id, Field.Store.YES));
// The sorted version of my timestamp
doc.add(new NumericDocValuesField("Timestamp", timestamp));
// The stored version of my timestamp to be able to get its value later if needed
doc.add(new StringField("Tsp", Long.toString(timestamp), Field.Store.YES));

然后，我将能够获得给定EntityId
的最新文档，如下所示：
IndexSearcher searcher = ...
// Some random query here I get all docs
Query query = new MatchAllDocsQuery();
// Group the docs by EntityId
GroupingSearch groupingSearch = new GroupingSearch("EntityId");
// Sort the docs of the same group by Timestamp in reversed order to get
// the most recent first
groupingSearch.setSortWithinGroup(
    new Sort(new SortField("Timestamp", SortField.Type.LONG, true))
);
// Set the limit of docs for a given group to 1 as we only want the latest
// NB: This is the default value so it is not required
groupingSearch.setGroupDocsLimit(1);
// Get the 10 first matching groups
TopGroups<BytesRef> result = groupingSearch.search(searcher, query, 0, 10);
// Iterate over the groups found
for (GroupDocs<BytesRef> groupDocs : result.groups) {
    // Iterate over the docs of a given group
    for (ScoreDoc scoreDoc : groupDocs.scoreDocs) {
        // Get the related doc
        Document doc = searcher.doc(scoreDoc.doc);
        // Print the stored value of EntityId and Timestamp
        System.out.printf(
            "EntityId = %s Timestamp = %s%n", doc.get("Id"),  doc.get("Tsp")
        );
    }
}

IndexSearcher搜索器=。。。
//这里有一些随机查询，我得到了所有的文档
Query Query=new MatchAllDocsQuery（）；
//按EntityId对文档进行分组
GroupingSearch GroupingSearch=新GroupingSearch（“EntityId”）；
//将同一组的单据按时间戳反序排序，得到
//最近的第一次
groupingSearch.setSortWithinGroup(
新排序（新SortField（“Timestamp”，SortField.Type.LONG，true））
);
//将给定组的文档限制设置为1，因为我们只需要最新的文档
//注意：这是默认值，因此不是必需的
groupingSearch.setGroupDocsLimit（1）；
//获得前10个匹配组
TopGroups结果=groupingSearch.search（搜索者，查询，0，10）；
//迭代找到的组
for（GroupDocs GroupDocs:result.groups）{
//迭代给定组的文档
for（ScoreDoc ScoreDoc:groupDocs.scoreDocs）{
//获取相关文档
Document doc=searcher.doc（scoreDoc.doc）；
//打印EntityId和时间戳的存储值
System.out.printf(
“EntityId=%s时间戳=%s%n”、doc.get（“Id”）、doc.get（“Tsp”）
);
}
}

有关的更多详细信息。
我使用的是Lucene，而不是Solr，除非我错了，这些都是Solr特有的。哦，对不起，我刚才看了另一个Solr问题，我没有意识到这是普通的Lucene。啊！-我认为我在阅读文档时遗漏的关键信息是sortedDoddocValuesField
位。我需要重新编制索引，但我会尝试一下，然后回来标记响应。谢谢是的，我很确定这就是我想要的。我目前正试图找出如何使用getAllMatchingGroups，因为我希望对每个实体进行分组，但是我不知道如何处理返回的ByteRef:s集合
IndexSearcher searcher = ...
// Some random query here I get all docs
Query query = new MatchAllDocsQuery();
// Group the docs by EntityId
GroupingSearch groupingSearch = new GroupingSearch("EntityId");
// Sort the docs of the same group by Timestamp in reversed order to get
// the most recent first
groupingSearch.setSortWithinGroup(
    new Sort(new SortField("Timestamp", SortField.Type.LONG, true))
);
// Set the limit of docs for a given group to 1 as we only want the latest
// NB: This is the default value so it is not required
groupingSearch.setGroupDocsLimit(1);
// Get the 10 first matching groups
TopGroups<BytesRef> result = groupingSearch.search(searcher, query, 0, 10);
// Iterate over the groups found
for (GroupDocs<BytesRef> groupDocs : result.groups) {
    // Iterate over the docs of a given group
    for (ScoreDoc scoreDoc : groupDocs.scoreDocs) {
        // Get the related doc
        Document doc = searcher.doc(scoreDoc.doc);
        // Print the stored value of EntityId and Timestamp
        System.out.printf(
            "EntityId = %s Timestamp = %s%n", doc.get("Id"),  doc.get("Tsp")
        );
    }
}