elasticsearch,Pandas,elasticsearch" /> elasticsearch,Pandas,elasticsearch" />

使用pandas的elasticsearch批量API

使用pandas的elasticsearch批量API,pandas,elasticsearch,Pandas,elasticsearch,我有一个数据框,可以在elasticsearch中导入,没有任何问题。但每一行都将被创建为新记录。我想将目的地号码另存为ID,并用其他记录更新文档 from StringIO import StringIO import pandas as pd u_cols = ["destination", "status", "batchid", "dateint", "message", "senddate"] audit_trail = StringIO(''' 918968400000 | D

我有一个数据框,可以在elasticsearch中导入,没有任何问题。但每一行都将被创建为新记录。我想将目的地号码另存为ID,并用其他记录更新文档

from StringIO import StringIO
import pandas as pd

u_cols = ["destination", "status", "batchid", "dateint", "message", "senddate"]

audit_trail = StringIO('''
918968400000 |  DELIVRD  | abcd_xyz-e89a4ebd3729675c | 20150226103700 | "some company is advertising here" | 2015-04-02 13:12:18  
918968400000 |  DELIVRD  | efgh_xyz-e89a4ebd3729675c | 20160226103700 | "some company is advertising here" | 2016-04-02 13:12:18  
8918968400000 |  FAILED  | abcd_xyz-e89a4ebd3729675c | 20150826103700 | "some company is advertising here" | 2015-08-02 13:12:18  
8918968400000 |  DELIVRD  | xyz_abc-e89a4ebd3729675c | 20140226103700 | "some company is advertising here" | 2014-04-02 13:12:18  
918968400000 |  FAILED  | abcd_pqr-e89a4ebd3729675c | 20150221103700 | "some company is advertising here" | 2015-04-21 13:12:18  
''')

df11 = pd.read_csv(audit_trail, sep="|", names = u_cols  )


import json
tmp = df11.to_json(orient = "records")
df_json= json.loads(tmp)


mylist=[]
for doc in df_json:
    action = { "_index": "myindex3", "_type": "myindex1type", "_source": doc }
    mylist.append(action)


import elasticsearch
from elasticsearch import helpers
es = elasticsearch.Elasticsearch('http://23.23.186.196:9200')
helpers.bulk(es, mylist)
在上述情况下,我希望只有2个文档。一个文档ID为918968400000,有3条记录,另一个文档ID为8918968400000,只有2条记录。这些记录将被嵌套为如下内容

doc={"campaigns" : [{"status": "FAILED", "batchid": "abcd_xyz-e89a4ebd3729675c", "dateint": 20150826103700, "message" : "some company is advertising here", "senddate": "2015-08-02 13:12:18"},
{"status": "DELIVRD", "batchid": "xyz_abc-e89a4ebd3729675c", "dateint": 20140226103700, "message" : "some company is advertising here", "senddate": "2014-04-02 13:12:18" }]}

res = es.index(index="test-index", doc_type='tweet', id=8918968400000, body=doc)
我需要pandas dataframe来使用批量API插入数据,如上图所示。可能吗


更新

我将选项类型从索引更改为更新。这存储了我需要的所有字段。但它不会将它们存储为嵌套对象

mylist=[]
for id, doc in enumerate(df_json):
    mydoc[id] = {}
    mydoc[id]['doc']=doc
    mydoc[id]['doc_as_upsert'] = True
    action = { "_index": "myindex9", "_type": "myindex1type", "_id": doc['destination'] , "_on_type": "update", "_source": mydoc }
    mylist.append(action)
有没有办法将行存储为嵌套对象