elasticsearch,Python,elasticsearch" /> elasticsearch,Python,elasticsearch" />

Elasticsearch Python客户端重新索引Timedout

Elasticsearch Python客户端重新索引Timedout,python,elasticsearch,Python,elasticsearch,我正在尝试使用Elasticsearch python客户端重新编制索引,使用。但我一直得到以下异常:elasticsearch.exceptions.ConnectionTimeout:ConnectionTimeout由-ReadTimeout引起 错误的堆栈跟踪是 Traceback (most recent call last): File "~/es_test.py", line 33, in <module> main() File "~/es_test.

我正在尝试使用Elasticsearch python客户端重新编制索引,使用。但我一直得到以下异常:
elasticsearch.exceptions.ConnectionTimeout:ConnectionTimeout由-ReadTimeout引起

错误的堆栈跟踪是

Traceback (most recent call last):
  File "~/es_test.py", line 33, in <module>
    main()
  File "~/es_test.py", line 30, in main
    target_index='users-2')
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 306, in reindex
    chunk_size=chunk_size, **kwargs)
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "~/ENV/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 124, in streaming_bulk
    raise e
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeout(HTTPSConnectionPool(host='myhost', port=9243): Read timed out. (read timeout=10))

这可能是因为Java堆空间的OutOfMemoryError导致的,这意味着您没有为elasticsearch提供足够的内存来执行您想要执行的操作。 如果有任何类似的异常,请尝试查看您的
/var/log/elasticsearch


我已经受此问题困扰了几天,我将request\u timeout参数更改为30(即30秒)不起作用。 最后,我必须在elasticsearch.py中编辑stream_bulk和reindex API

将chunk_size参数从默认的500(处理500个文档)更改为每批更少的文档数。我把我的换成了50,这对我来说很好。不再有读取超时错误

def streaming_bulk(客户端,操作,块大小=50,启动时出错=True, expand\u action\u callback=expand\u action,raise\u on\u exception=True, **kwargs):

def reindex(客户端、源索引、目标索引、查询=无、目标客户端=无、,
chunk\u size=50,scroll='5m',scan\u-kwargs={},bulk\u-kwargs={}):

你能给你看python代码吗?@Val我包括了我的代码。你可以尝试在
重新索引调用中添加参数(可能值=100)?使用
chunk\u-size
,你会没事的。我已经能够使用一个简单的重新索引调用来重新索引数百万个文档。示例:
helpers.reindex(es,源索引=旧索引,目标索引=新索引,块大小=1000)
我在哪里可以找到elasticsearch,py?
from elasticsearch import Elasticsearch, RequestsHttpConnection, helpers

es = Elasticsearch(connection_class=RequestsHttpConnection,
                   host='myhost',
                   port=9243,
                   http_auth=HTTPBasicAuth(username, password),
                   use_ssl=True,
                   verify_certs=True,
                   timeout=600)
helpers.reindex(es, source_index=old_index, target_index=new_index)