使用字符串时内存泄漏<;Python中的128KB? 原始标题:Python中内存泄漏打开文件
在运行Python脚本时,我发现我认为是内存泄漏。这是我的剧本:使用字符串时内存泄漏<;Python中的128KB? 原始标题:Python中内存泄漏打开文件,python,memory-management,memory-leaks,garbage-collection,Python,Memory Management,Memory Leaks,Garbage Collection,在运行Python脚本时,我发现我认为是内存泄漏。这是我的剧本: import sys import time class MyObj(object): def __init__(self, filename): with open(filename) as f: self.att = f.read() def myfunc(filename): mylist = [MyObj(filename) for x in xrange(1
import sys
import time
class MyObj(object):
def __init__(self, filename):
with open(filename) as f:
self.att = f.read()
def myfunc(filename):
mylist = [MyObj(filename) for x in xrange(100)]
len(mylist)
return []
def main():
filename = sys.argv[1]
myfunc(filename)
time.sleep(3600)
if __name__ == '__main__':
main()
主函数调用myfunc()
读取文件。从myfunc()
返回后,我希望100项列表和
无法读取要释放的文件,因为它们不再被引用。然而,当我
使用ps
命令检查内存使用情况,Python进程大约使用10000 KB
比从注释掉第12行和第13行的脚本运行的Python进程内存更多
奇怪的是,内存泄漏(如果是这样的话)似乎只会发生
对于文件,我将研究垃圾收集。可能是较大的文件触发垃圾收集的频率更高,但较小的文件被释放,但总体上保持在某个阈值。具体地说,调用gc.collect(),然后在对象上调用gc.get_referers(),希望能够揭示保持实例存在的原因。请参见此处的Python文档:
更新:
这个问题与垃圾收集、命名空间和引用计数有关。您发布的bash脚本提供了垃圾收集器行为的一个相当狭窄的视图。尝试更大的范围,您将看到特定范围占用内存的模式。例如,将bash For循环更改为更大的范围,例如:seq 0 16 2056
您注意到,如果删除del mystr
,内存使用会减少,因为您正在删除对它的所有引用。如果将mystr变量限制为它自己的函数,则可能会出现类似的结果,如下所示:
def loopy():
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
return mylist
与使用bash脚本相比,我认为使用内存分析器可以获得更多有用的信息。这里有几个例子使用。此第一个版本与更新3中的代码类似:
import gc
import sys
import time
from pympler import tracker
tr = tracker.SummaryTracker()
print 'begin:'
tr.print_diff()
size_kb = sys.argv[1]
mylist = []
mydict = {}
print 'empty list & dict:'
tr.print_diff()
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
print 'after for loop:'
tr.print_diff()
del mystr
del mydict
del mylist
print 'after deleting stuff:'
tr.print_diff()
collected = gc.collect()
print 'after garbage collection (collected: %d):' % collected
tr.print_diff()
time.sleep(2)
print 'took a short nap after all that work:'
tr.print_diff()
mylist = []
print 'create an empty list for some reason:'
tr.print_diff()
以及输出:
$ python mem_test.py 256
begin:
types | # objects | total size
======================= | =========== | =============
list | 957 | 97.44 KB
str | 951 | 53.65 KB
int | 118 | 2.77 KB
wrapper_descriptor | 8 | 640 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
getset_descriptor | 2 | 144 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
instancemethod | -1 | -80 B
_sre.SRE_Pattern | -2 | -176 B
tuple | -1 | -216 B
dict | 2 | -1744 B
empty list & dict:
types | # objects | total size
======= | =========== | ============
list | 2 | 168 B
str | 2 | 97 B
int | 1 | 24 B
after for loop:
types | # objects | total size
======= | =========== | ============
str | 1 | 256.04 KB
list | 0 | 848 B
after deleting stuff:
types | # objects | total size
======= | =========== | ===============
list | -1 | -920 B
str | -1 | -262181 B
after garbage collection (collected: 0):
types | # objects | total size
======= | =========== | ============
took a short nap after all that work:
types | # objects | total size
======= | =========== | ============
create an empty list for some reason:
types | # objects | total size
======= | =========== | ============
list | 1 | 72 B
请注意,在for循环之后,str类的总大小为256 KB,基本上与我传递给它的参数相同。在del mystr
中明确删除对mystr的引用后,内存被释放。在此之后,垃圾已经被拾起,因此在gc.collect()
之后没有进一步的减少
下一个版本使用函数为字符串创建不同的名称空间
import gc
import sys
import time
from pympler import tracker
def loopy():
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
return mylist
tr = tracker.SummaryTracker()
print 'begin:'
tr.print_diff()
size_kb = sys.argv[1]
mylist = loopy()
print 'after for loop:'
tr.print_diff()
del mylist
print 'after deleting stuff:'
tr.print_diff()
collected = gc.collect()
print 'after garbage collection (collected: %d):' % collected
tr.print_diff()
time.sleep(2)
print 'took a short nap after all that work:'
tr.print_diff()
mylist = []
print 'create an empty list for some reason:'
tr.print_diff()
最后是这个版本的输出:
$ python mem_test_2.py 256
begin:
types | # objects | total size
======================= | =========== | =============
list | 958 | 97.53 KB
str | 952 | 53.70 KB
int | 118 | 2.77 KB
wrapper_descriptor | 8 | 640 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
getset_descriptor | 2 | 144 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
instancemethod | -1 | -80 B
_sre.SRE_Pattern | -2 | -176 B
tuple | -1 | -216 B
dict | 2 | -1744 B
after for loop:
types | # objects | total size
======= | =========== | ============
list | 2 | 1016 B
str | 2 | 97 B
int | 1 | 24 B
after deleting stuff:
types | # objects | total size
======= | =========== | ============
list | -1 | -920 B
after garbage collection (collected: 0):
types | # objects | total size
======= | =========== | ============
took a short nap after all that work:
types | # objects | total size
======= | =========== | ============
create an empty list for some reason:
types | # objects | total size
======= | =========== | ============
list | 1 | 72 B
现在,我们不必清理str,我认为这个例子说明了为什么使用函数是个好主意。在一个名称空间中有一大块代码的情况下生成代码实际上会阻止垃圾收集器完成其工作。它不会进入你的房子,并开始假设东西是垃圾:)它必须知道收集东西是安全的
顺便说一句,Evan Jones非常有趣。您可能只需点击linux内存分配器的默认行为即可
基本上Linux有两种分配策略,sbrk()用于小内存块,mmap()用于大内存块。sbrk()分配的内存块无法轻松返回到系统,而基于mmap()的内存块可以(只需取消页面映射)
因此,如果分配的内存块大于libc中malloc()分配器决定在sbrk()和mmap()之间切换的值,则会看到这种效果。请参阅mallopt()调用,特别是MMAP_THRESHOLD()
更新
回答您的额外问题:是的,如果内存分配器的工作方式与Linux上的libc类似,那么您可能会以这种方式泄漏内存。如果改用Windows LowFragmentationHeap,它可能不会泄漏,在AIX上类似,这取决于配置的malloc。也许其他分配器之一(tcmalloc等)也解决了此类问题。sbrk()速度非常快,但存在内存碎片问题。CPython对此无能为力,因为它没有压缩垃圾收集器,只有简单的引用计数
Python提供了一些减少缓冲区分配的方法,例如,请参阅这里的博客文章:我不确定将get\u referers()放在哪里,但我尝试使用gc.disable()
禁用gc,得到了相同的结果。嗨,salty,我不认为gc.disable()会关闭所有Python基于引用计数的垃圾收集,只是圆环参考。在main()中的myfunc(大小为kb)
之后调用gc.collect()如何?我尝试添加gc.collect()
,但仍然得到类似的结果。请看,我已根据您的更新和实验更新了我的答案我尝试运行我的最新脚本,一直运行到2056年,但在128KB之后我没有看到太大的变化(请参阅)。我没有任何智慧可以告诉您,但我只想说,这是我在StackExchange上见过的最详细、最清晰的问题之一。这本书很吸引人,尽管我不知道问题出在哪里。当然,mmap()的速度要慢得多。sbrk()或多或少只是递增一个指针,mmap()是一个系统调用。很抱歉打扰您,我遇到了一个内存泄漏问题,我认为这与您的答案相对应。现在,当我使用top查看内存时,它是一个很大的内存。但当我加入到这个过程中并使用guppy时,它显示的内存很少。所以我想知道是否是因为sbrk()不将内存返回操作系统。但我如何确定或判断内存是否被sbrk占用?
import gc
import sys
import time
def main():
size_kb = sys.argv[1]
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {'mykey': mystr}
mylist.append(mydict)
del mystr
del mydict
del mylist
gc.collect()
time.sleep(3600)
if __name__ == '__main__':
main()
def loopy():
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
return mylist
import gc
import sys
import time
from pympler import tracker
tr = tracker.SummaryTracker()
print 'begin:'
tr.print_diff()
size_kb = sys.argv[1]
mylist = []
mydict = {}
print 'empty list & dict:'
tr.print_diff()
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
print 'after for loop:'
tr.print_diff()
del mystr
del mydict
del mylist
print 'after deleting stuff:'
tr.print_diff()
collected = gc.collect()
print 'after garbage collection (collected: %d):' % collected
tr.print_diff()
time.sleep(2)
print 'took a short nap after all that work:'
tr.print_diff()
mylist = []
print 'create an empty list for some reason:'
tr.print_diff()
$ python mem_test.py 256
begin:
types | # objects | total size
======================= | =========== | =============
list | 957 | 97.44 KB
str | 951 | 53.65 KB
int | 118 | 2.77 KB
wrapper_descriptor | 8 | 640 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
getset_descriptor | 2 | 144 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
instancemethod | -1 | -80 B
_sre.SRE_Pattern | -2 | -176 B
tuple | -1 | -216 B
dict | 2 | -1744 B
empty list & dict:
types | # objects | total size
======= | =========== | ============
list | 2 | 168 B
str | 2 | 97 B
int | 1 | 24 B
after for loop:
types | # objects | total size
======= | =========== | ============
str | 1 | 256.04 KB
list | 0 | 848 B
after deleting stuff:
types | # objects | total size
======= | =========== | ===============
list | -1 | -920 B
str | -1 | -262181 B
after garbage collection (collected: 0):
types | # objects | total size
======= | =========== | ============
took a short nap after all that work:
types | # objects | total size
======= | =========== | ============
create an empty list for some reason:
types | # objects | total size
======= | =========== | ============
list | 1 | 72 B
import gc
import sys
import time
from pympler import tracker
def loopy():
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
return mylist
tr = tracker.SummaryTracker()
print 'begin:'
tr.print_diff()
size_kb = sys.argv[1]
mylist = loopy()
print 'after for loop:'
tr.print_diff()
del mylist
print 'after deleting stuff:'
tr.print_diff()
collected = gc.collect()
print 'after garbage collection (collected: %d):' % collected
tr.print_diff()
time.sleep(2)
print 'took a short nap after all that work:'
tr.print_diff()
mylist = []
print 'create an empty list for some reason:'
tr.print_diff()
$ python mem_test_2.py 256
begin:
types | # objects | total size
======================= | =========== | =============
list | 958 | 97.53 KB
str | 952 | 53.70 KB
int | 118 | 2.77 KB
wrapper_descriptor | 8 | 640 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
getset_descriptor | 2 | 144 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
instancemethod | -1 | -80 B
_sre.SRE_Pattern | -2 | -176 B
tuple | -1 | -216 B
dict | 2 | -1744 B
after for loop:
types | # objects | total size
======= | =========== | ============
list | 2 | 1016 B
str | 2 | 97 B
int | 1 | 24 B
after deleting stuff:
types | # objects | total size
======= | =========== | ============
list | -1 | -920 B
after garbage collection (collected: 0):
types | # objects | total size
======= | =========== | ============
took a short nap after all that work:
types | # objects | total size
======= | =========== | ============
create an empty list for some reason:
types | # objects | total size
======= | =========== | ============
list | 1 | 72 B