Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/security/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Elasticsearch是否批量获取搜索结果?_Python_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch - Fatal编程技术网 elasticsearch,Python,elasticsearch" /> elasticsearch,Python,elasticsearch" />

Python Elasticsearch是否批量获取搜索结果?

Python Elasticsearch是否批量获取搜索结果?,python,elasticsearch,Python,elasticsearch,当我使用Python Elasticsearch API查询Elasticsearch时,我得到大约5000个结果。将搜索查询中的“size”参数设置为大于结果数的数字会导致以下错误: File "MGDFinder.py", line 114, in <module> res = es.search(index="_all", body=queryMaker(state)) File "/usr/local/lib/python2.7/dist-packages/elastic

当我使用Python Elasticsearch API查询Elasticsearch时,我得到大约5000个结果。将搜索查询中的“size”参数设置为大于结果数的数字会导致以下错误:

File "MGDFinder.py", line 114, in <module>
  res = es.search(index="_all", body=queryMaker(state))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 68, in _wrapped
  return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 440, in search
  params=params, body=body)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 276, in perform_request
  status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 55, in perform_request
  self._raise_error(response.status, raw_data)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 97, in _raise_error
  raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(500, u'OutOfMemoryError[Java heap space]')
文件“MGDFinder.py”,第114行,在
res=es.search(index=“\u all”,body=queryMaker(state))
文件“/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py”,第68行,以
返回函数(*args,params=params,**kwargs)
文件“/usr/local/lib/python2.7/dist packages/elasticsearch/client/_init__.py”,第440行,搜索中
参数=参数,主体=主体)
文件“/usr/local/lib/python2.7/dist packages/elasticsearch/transport.py”,第276行,在执行请求中
状态、标题、数据=连接。执行_请求(方法、url、参数、正文、忽略=忽略、超时=超时)
文件“/usr/local/lib/python2.7/dist packages/elasticsearch/connection/http_urllib3.py”,执行请求中的第55行
self.\u raise\u错误(response.status,原始数据)
文件“/usr/local/lib/python2.7/dist packages/elasticsearch/connection/base.py”,第97行,出现错误
引发HTTP_异常。获取(状态代码,传输错误)(状态代码,错误消息,附加信息)
elasticsearch.exceptions.TransportError:TransportError(500,u'OutOfMemoryError[Java堆空间])

我注意到,当大小设置为甚至只有700时,就会发生这种情况。我不想增加Java堆的大小。有没有一种方法可以成批执行500次搜索?

我认为如果不增加
Java堆空间,您无法成批执行请求,服务器仍将存储5000个结果并返回

我认为您可以使用
滚动
来获取请求,
滚动
可以快速从大量结果中检索,就像传统数据库中的
光标

示例请求:

$ curl -XGET 'localhost:9200/world/test/_search?scroll=1m&pretty' -d '
{
    "size": 50,
    "query": {
        "match_all": {}
    }
}'
样本响应:

{
  "_scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTszNjpXZW9lRnJXSFItT0U2YUtIM1hOa0FBOzM3Oldlb2VGcldIUi1PRTZhS0gzWE5rQUE7Mzg6V2VvZUZyV0hSLU9FNmFLSDNYTmtBQTs0MDpXZW9lRnJXSFItT0U2YUtIM1hOa0FBOzM5Oldlb2VGcldIUi1PRT
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {....
结果将返回一个滚动id,可用于获取下一次点击

示例
滚动
请求(
-d
\u滚动\u id):


官方文档:

我不认为在不增加
Java堆空间的情况下可以批处理请求,服务器仍将存储5000个结果并返回

我认为您可以使用
滚动
来获取请求,
滚动
可以快速从大量结果中检索,就像传统数据库中的
光标

示例请求:

$ curl -XGET 'localhost:9200/world/test/_search?scroll=1m&pretty' -d '
{
    "size": 50,
    "query": {
        "match_all": {}
    }
}'
样本响应:

{
  "_scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTszNjpXZW9lRnJXSFItT0U2YUtIM1hOa0FBOzM3Oldlb2VGcldIUi1PRTZhS0gzWE5rQUE7Mzg6V2VvZUZyV0hSLU9FNmFLSDNYTmtBQTs0MDpXZW9lRnJXSFItT0U2YUtIM1hOa0FBOzM5Oldlb2VGcldIUi1PRT
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {....
结果将返回一个滚动id,可用于获取下一次点击

示例
滚动
请求(
-d
\u滚动\u id):


官方文件:

最好在之后使用search\u,最好在之后使用search\u