使用python api在弹性搜索中转储批量数据_Python_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch

使用python api在弹性搜索中转储批量数据

python

使用python api在弹性搜索中转储批量数据,python,elasticsearch,Python,elasticsearch,我想使用其python api在弹性搜索中索引莎士比亚数据。我正在犯错误 PUT http://localhost:9200/shakes/play/3 [status:400 request:0.098s] {'error': {'root_cause': [{'type': 'mapper_parsing_exception', 'reason': 'failed to parse'}], 'type': 'mapper_parsing_exception', 'reason': '

我想使用其python api在弹性搜索中索引莎士比亚数据。我正在犯错误

    PUT http://localhost:9200/shakes/play/3 [status:400 request:0.098s]
{'error': {'root_cause': [{'type': 'mapper_parsing_exception', 'reason': 'failed to parse'}], 'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by': {'type': 'not_x_content_exception', 'reason': 'Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes'}}, 'status': 400}

python脚本

from elasticsearch import Elasticsearch
from elasticsearch import TransportError
import json

data = []

for line in open('shakespeare.json', 'r'):
    data.append(json.loads(line))

es = Elasticsearch()

res = 0
cl = []
# filtering data which i need
for d in data:
    if res == 0:
        res = 1 
        continue
    cl.append(data[res])
    res = 0

try:
    res = es.index(index = "shakes", doc_type = "play", id = 3, body = cl)
    print(res)
except TransportError as e:
    print(e.info)

我还尝试使用json.dumps，但仍然得到相同的错误。但是，当只向下面的弹性搜索添加列表的一个元素时，代码就起作用了。

您并没有向es发送批量请求，而只是发送一个简单的创建请求-请看一看。此方法适用于表示新文档的dict，而不适用于文档列表。如果您在创建请求上放置了一个id，那么您需要将该值设置为动态值，否则每个文档都将在最后一个指定文档的id上被覆盖。如果在json中，每行都有一条记录，您应该尝试此操作-请阅读大量文档：

  from elasticsearch import helpers

es = Elasticsearch()
op_list = []
with open("C:\ElasticSearch\shakespeare.json") as json_file:
    for record in json_file:
        op_list.append({
                       '_op_type': 'index',
                       '_index': 'shakes',
                       '_type': 'play',
                       '_source': record
                     })
helpers.bulk(client=es, actions=op_list)