Python Django内存泄漏_Python_Django_Python 3.x

Python Django内存泄漏

python django python-3.x

Python Django内存泄漏,python,django,python-3.x,Python,Django,Python 3.x,我正在开发一个python应用程序，它使用Django-ORM作为独立的数据库管理工具，但我面临着一个大内存问题。我发现导致此问题的部分是： ports_list_save = [] for host in results['hosts']: for protocol in results['hosts'][host]['protocols']: for port in results['hosts'][host]['protocols'][pro

我正在开发一个python应用程序，它使用

Django-ORM

作为独立的数据库管理工具，但我面临着一个大内存问题。我发现导致此问题的部分是：

ports_list_save = []
    for host in results['hosts']:
        for protocol in results['hosts'][host]['protocols']:
            for port in results['hosts'][host]['protocols'][protocol]:
                current_port = history.Port(number=int(port), 
                                            protocol=protocol, 
                                            state=results['hosts'][host]['protocols'][protocol][port]['state'], 
                                            service='', 
                                            version='', 
                                            address=history.Ip.objects.get(scan=self.scan, address=host))
                ports_list_save.append(current_port)
    history.Port.objects.bulk_create(ports_list_save)

这一部分在154台主机和每台主机150个端口（23000个）的情况下运行良好，但现在我尝试使用1000个端口，每次我的计算机内存都会爆炸

还有一件事，我没有在调试模式下运行

Django

，所以内存不是来自

django.db.backends.postgresql\u psycopg2.base.DatabaseWrapper

如果您有大量数据，可能仍需要分块加载和处理，请尝试以下操作：

CHUNK_SIZE = 23000
ports_list_save = []
for host in results['hosts']:
    for protocol in results['hosts'][host]['protocols']:
        for port in results['hosts'][host]['protocols'][protocol]:
            current_port = history.Port(number=int(port), 
                                        protocol=protocol, 
                                        state=results['hosts'][host]['protocols'][protocol][port]['state'], 
                                        service='', 
                                        version='', 
                                        address=history.Ip.objects.get(scan=self.scan, address=host))
            ports_list_save.append(current_port)
            if len(ports_list_save) > CHUNK_SIZE:
                history.Port.objects.bulk_create(ports_list_save)
                ports_list_save = []
if ports_list_save:   
    history.Port.objects.bulk_create(ports_list_save)

我遇到了同样的问题，最终得到了这个解决方案：

class BulkCreateManager(object):

    model = None
    chunk_size = None
    instances = None

    def __init__(self, model, chunk_size=None, *args):
        self.model = model
        self.chunk_size = chunk_size
        self.instances = []

    def append(self, instance):
        if self.chunk_size and len(self.instances) > self.chunk_size:
            self.create()
            self.instances = []

        self.instances.append(instance)

    def create(self):
        self.model.objects.bulk_create(self.instances)



ports_list_save = BulkCreateManager(history.Port, 23000)
for host in results['hosts']:
    for protocol in results['hosts'][host]['protocols']:
        for port in results['hosts'][host]['protocols'][protocol]:
            current_port = history.Port(number=int(port), 
                                        protocol=protocol, 
                                        state=results['hosts'][host]['protocols'][protocol][port]['state'], 
                                        service='', 
                                        version='', 
                                        address=history.Ip.objects.get(scan=self.scan, address=host))
            ports_list_save.append(current_port)

ports_list_save.create()

我不确定我是否理解这个问题。您已将数据数量增加了约10倍，因此预计内存使用量将增加约10倍。这里有什么不寻常的事吗？也就是说，是什么让你认为内存泄漏了？这个过程使用了超过2Go RAM+1Go交换来处理100k*3个最多10个字符的字符串，听起来对我来说太多了。但你不是在处理字符串。这些是Python对象。每一小块数据都占用内存，它们可以使用千字节。看起来不像是内存泄漏。工作很好，thx:）。顺便说一句，我发现了一些同样有效的方法，它在每个“for host in results['hosts']”循环的末尾调用“gc.collect（）”。这是件坏事吗？哪条路最好？我不确定。调用

gc.collect（）

不会造成任何伤害，但我不确定您是否从中受益匪浅。除非你看到明显的改善，否则我就不提了。