使用python直接写入csv_Python_Pandas

使用python直接写入csv

python pandas

使用python直接写入csv,python,pandas,Python,Pandas,我有以下脚本查询ElasticSearch并将文档写入CSV documents = [] scanResp= helpers.scan(client=es, query=query, scroll= "10m", index="index-*-a",size=1000, clear_scroll=False, request_timeout=300) for doc in scanResp: print('--- next document -

我有以下脚本查询ElasticSearch并将文档写入CSV

documents = []
scanResp= helpers.scan(client=es, query=query, scroll= "10m", index="index-*-a",size=1000, clear_scroll=False, request_timeout=300)

for doc in scanResp:
    print('--- next document ----')
    row = doc['_source']
    print(row)
    document = {
        'artistAppearsAs': row['artistAppearsAs'],
        'isrc': row['isrc'],
        'artistId': row['artistAppearsAs'],
        'title': row['artistId']
    }
    documents.append(document)

df = pd.DataFrame(documents)
df.to_csv('../data/documents.csv', header=True, index=False, index_label=False)

我有大约400000个文档，使用Pd将每个文档写入CSV文件的正确方法是什么？

您不需要先将文档存储在列表中，您可以使用

scanResp=helpers.scan（客户端=es，查询=query，滚动=10m，索引=index-*-a，大小=1000，清除滚动=False，请求超时=300）
documents=pandas.DataFrame（）
对于scanResp中的文档：
打印（'---下一个文档--'）
行=单据[''来源']
打印（行）
文件={
“artistAppearsAs”：行[“artistAppearsAs”]，
“isrc”：第[“isrc”行，
“artistId”：行[“artistAppearsAs”]，
“标题”：行['artistId']
}
documents.append（document，ignore\u index=True）
documents.to_csv（'../data/documents.csv'，header=True，index=False，index_label=False）

这种方法对您不起作用有什么原因吗？不，它起作用了，我只是觉得我必须创建一个列表

文档

，然后从中创建csv文件。您在这里展示的看起来是一个非常好的解决方案。如果问题是内存中保存了多少数据，您可以大约每1000个文档转储一个新的CSV，然后在最后合并它们。您也可以在开始时创建一个

数据帧

，而不是附加到

列表

。