使用java api获取超过50k个文档时的ElasticSearch约束
Am使用java api查询elasticsearch索引使用java api获取超过50k个文档时的ElasticSearch约束,java,
elasticsearch,elastic-stack,Java,
elasticsearch,Elastic Stack,Am使用java api查询elasticsearch索引SearchSourceBuilder。我的索引中有超过100k个文档,并且我已经将索引增加了。如果我尝试获取120k个文档,则从我的java代码将最大结果窗口增加到120000。它在下面的行中抛出空指针异常 SearchHit[] searchHits = searchResponse.getHits().getHits(); 如果我将SearchSourceBuilder的大小减小到50k,那么它工作正常,但我只能获取50k文档 请
SearchSourceBuilder
。我的索引中有超过100k
个文档,并且我已经将索引增加了。如果我尝试获取120k
个文档,则从我的java代码将最大结果窗口增加到120000
。它在下面的行中抛出空指针异常
SearchHit[] searchHits = searchResponse.getHits().getHits();
如果我将SearchSourceBuilder
的大小减小到50k
,那么它工作正常,但我只能获取50k
文档
请在下面找到我的代码:
RestHighLevelClient restHighLevelClient = null;
Document doc=new Document();
logger.info("Started Indexing the Document.....");
try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
System.out.println(e.getMessage());
}
//Fetching Id, FilePath & FileName from Document Index.
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilder qb = QueryBuilders.matchAllQuery();
searchSourceBuilder.query(qb);
searchSourceBuilder.size(120000);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = restHighLevelClient.search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
SearchHit[] searchHits = searchResponse.getHits().getHits(); /// Getting null pointer exception after porcessing some documents. Count is not very constant.
long totalHits=searchResponse.getHits().totalHits;
logger.info("Total Hits --->"+totalHits);
请查找我的索引设置详细信息
{
"document_attachment": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "document_attachment",
"max_result_window": "150000",
"creation_date": "1531402811016",
"analysis": {
"analyzer": {
"custom_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "whitespace"
},
"product_catalog_keywords_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1",
"uuid": "UBRQAkg-Su-FfeAtBTGFIw",
"version": {
"created": "6020399"
}
}
}
}
}
您需要使用滚动搜索,而不是试图一次获取所有内容。这使您可以一页一页地浏览结果
通过滚动,您可以获得所需的任意多个结果;没有上限。你将无法获得排名结果t,但在这么大的结果集上这是毫无意义的
请参见如何执行此操作