用python将超过100万条记录写入csv_Python_Django_Csv

用python将超过100万条记录写入csv

python django csv

用python将超过100万条记录写入csv,python,django,csv,Python,Django,Csv,我正在使用python将一些数据提取到csv文件中，数据超过100万条记录。显然，我的脚本似乎存在内存问题，因为经过5个小时的艰苦工作和大约190k条记录的编写，脚本运行进程被终止这是我的终点站 (.venv)[cv1@mdecv01 maidea]$ python common_scripts/script_tests/ben-test-extract.py BEN Generating CSV file. Please wait ... Preparing to write file: B

我正在使用python将一些数据提取到csv文件中，数据超过100万条记录。显然，我的脚本似乎存在内存问题，因为经过5个小时的艰苦工作和大约190k条记录的编写，脚本运行进程被终止

这是我的终点站

(.venv)[cv1@mdecv01 maidea]$ python common_scripts/script_tests/ben-test-extract.py BEN
Generating CSV file. Please wait ...
Preparing to write file: BEN-data-20170731.csv
Killed
(.venv)[cv1@mdecv01 maidea]$

它们是我通过适当的内存管理提取这些数据的一种方法吗

是我的脚本

您没有利用或。如果不使用这两种方法，则每次访问相关字段（ForeignKey、ManyToManyField）时都会执行数据库调用

应该是这样的

for beneficiary in Beneficiary.objects.select_related(
    'household'
).prefetch_related(
    'enrolments',
    'interventions'
):
    if beneficiary.is_active:
        household = beneficiary.household
        if len(beneficiary.enrolments.all()) > 0 and len(beneficiary.interventions.all()) > 1:

例如，在queryset中筛选而不是提取所有数据。筛选（is_active=true），按计数筛选，例如注释（interferences_count=count（'interferences'））。筛选（interferences_count_gte=1）
使用偏移量和限制在迭代中拉取数据，而不是一次拉取所有数据[从（较小的内存消耗）[0:100]
使用与所需的预选表相关的select_和prefetch_

您能在

受益人.objects.all（）

上操作任何东西吗？请尝试打印或smth。否则，如果for-loop中出现内存问题，请尝试使用生成器，例如，可能会在问题中发布您的代码（或缩短的版本）也包括您的数据库设置。

for beneficiary in Beneficiary.objects.select_related(
    'household'
).prefetch_related(
    'enrolments',
    'interventions'
):
    if beneficiary.is_active:
        household = beneficiary.household
        if len(beneficiary.enrolments.all()) > 0 and len(beneficiary.interventions.all()) > 1: