elasticsearch,out-of-memory,django-haystack,Python,Django,elasticsearch,Out Of Memory,Django Haystack" /> elasticsearch,out-of-memory,django-haystack,Python,Django,elasticsearch,Out Of Memory,Django Haystack" />

Python Django Haystack内存错误

Python Django Haystack内存错误,python,django,elasticsearch,out-of-memory,django-haystack,Python,Django,elasticsearch,Out Of Memory,Django Haystack,因此,我正在使用Django Haystack进行弹性搜索,现在我正在尝试重建索引或更新索引,用于55149个文件,我得到了一个内存错误。我想这是因为它们有太多的文件,但我怎么能克服这个问题呢? 请不要说我希望索引大约200.000个文件 python manage.py rebuild_index WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'. Y

因此,我正在使用Django Haystack进行弹性搜索,现在我正在尝试
重建索引
更新索引
,用于55149个文件,我得到了一个
内存错误
。我想这是因为它们有太多的文件,但我怎么能克服这个问题呢? 请不要说我希望索引大约200.000个文件

python manage.py rebuild_index

WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Indexing 55149 processs
Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/rebuild_index.py", line 16, in handle
    call_command('update_index', **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 159, in call_command
    return klass.execute(*args, **defaults)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 195, in handle
    return super(Command, self).handle(*items, **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py", line 385, in handle
    label_output = self.handle_label(label, **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 221, in handle_label
    self.update_backend(label, using)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 267, in update_backend
    do_update(backend, index, qs, start, end, total, self.verbosity)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 89, in do_update
    backend.update(index, current_qs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/backends/elasticsearch_backend.py", line 183, in update
    self.conn.bulk_index(self.index_name, 'modelresult', prepped_docs, id_field=ID)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 96, in decorate
    return func(*args, query_params=query_params, **kwargs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 388, in bulk_index
    query_params=query_params)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 238, in send_request
    **({'data': request_body} if body else {}))
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/sessions.py", line 425, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/adapters.py", line 330, in send
    timeout=timeout
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 480, in urlopen
    body=body, headers=headers)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 285, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 812, in _send_output
    msg += message_body
MemoryError
python manage.py重建索引
警告:这将不可修复地删除连接“默认”中搜索索引中的所有内容。
在此之后,您可以选择从备份恢复或通过“rebuild_index”命令进行重建。
您确定要继续吗?[y/N]y
因为您这样说,所以正在从索引中删除所有文档。
删除所有文件。
索引55149进程
回溯(最近一次呼叫最后一次):
文件“manage.py”,第10行,在
从命令行(sys.argv)执行命令
文件“/home/vagrant/blook/local/lib/python2.7/site packages/django/core/management/_init__.py”,第399行,从命令行执行
utility.execute()
文件“/home/vagrant/blook/local/lib/python2.7/site packages/django/core/management/_init__.py”,第392行,在execute中
self.fetch_命令(子命令)。从_argv(self.argv)运行_
文件“/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py”,第242行,运行时从
self.execute(*args,**选项._dict__;
文件“/home/vagrant/blook/local/lib/python2.7/site packages/django/core/management/base.py”,执行中第285行
输出=self.handle(*args,**选项)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/haystack/management/commands/rebuild_index.py”,第16行,在handle中
调用命令('更新索引',**选项)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/django/core/management/_init__.py”,第159行,在call_命令中
返回klass.execute(*args,**默认值)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/django/core/management/base.py”,执行中第285行
输出=self.handle(*args,**选项)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/haystack/management/commands/update_index.py”,第195行,在handle中
返回super(命令,self).handle(*项,**选项)
handle中的文件“/home/vagrant/blook/local/lib/python2.7/site packages/django/core/management/base.py”,第385行
label\u output=self.handle\u标签(标签,**选项)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/haystack/management/commands/update_index.py”,第221行,在句柄标签中
self.update\u后端(标签,使用)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/haystack/management/commands/update_index.py”,第267行,在update_backend中
do_更新(后端、索引、qs、开始、结束、总计、自身详细信息)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/haystack/management/commands/update_index.py”,第89行,在do_update中
backend.update(索引、当前目录)
更新中的文件“/home/vagrant/blook/local/lib/python2.7/site packages/haystack/backends/elasticsearch_backend.py”,第183行
self.conn.bulk\u索引(self.index\u名称,'modelresult',准备好的文档,id\u字段=id)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/pyelasticsearch/client.py”,第96行
return func(*args,query_params=query_params,**kwargs)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/pyelasticsearch/client.py”,第388行,批量索引
查询参数=查询参数)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/pyelasticsearch/client.py”,第238行,在发送请求中
**({'data':请求_body}如果body else{}))
文件“/home/vagrant/blook/local/lib/python2.7/site packages/requests/sessions.py”,第425行,在post中
返回self.request('POST',url,data=data,**kwargs)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/requests/sessions.py”,请求中第383行
resp=自我发送(准备,**发送)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/requests/sessions.py”,第486行,在send中
r=适配器.send(请求,**kwargs)
文件“/home/vagrant/blook/local/lib/python2.7/site packages/requests/adapters.py”,第330行,在send中
超时=超时
文件“/home/vagrant/blook/local/lib/python2.7/site packages/requests/packages/urllib3/connectionpool.py”,urlopen中第480行
正文=正文,标题=标题)
文件“/home/vagrant/blook/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py”,请求中第285行
conn.request(方法,url,**httplib\u request\u kw)
请求中的文件“/usr/lib/python2.7/httplib.py”,第958行
self.\u发送请求(方法、url、正文、标题)
文件“/usr/lib/python2.7/httplib.py”,第992行,在发送请求中
self.endheaders(主体)
文件“/usr/lib/python2.7/httplib.py”,第954行,在endheaders中
自发送输出(消息体)
文件“/usr/lib/python2.7/httplib.py”,第812行,在发送输出中
msg+=消息体
记忆者

我可以通过增加虚拟机内存(以前我只有384MB)并释放一些空间来解决这个问题


通过使用
--batch size=XXX

默认值为1000,所以请尝试较小的值,然后逐步向上