带有Gevent池的Python脚本,占用大量内存,锁定
我有一个非常简单的Python脚本,使用gevent.pool下载URL(见下文)。脚本可以正常运行几天,然后锁定。我注意到当时内存使用率很高。我是否错误地使用了gevent带有Gevent池的Python脚本,占用大量内存,锁定,python,python-2.7,gevent,Python,Python 2.7,Gevent,我有一个非常简单的Python脚本,使用gevent.pool下载URL(见下文)。脚本可以正常运行几天,然后锁定。我注意到当时内存使用率很高。我是否错误地使用了gevent import sys from gevent import monkey monkey.patch_all() import urllib2 from gevent.pool import Pool inputFile = open(sys.argv[1], 'r') urls = [] counter = 0 fo
import sys
from gevent import monkey
monkey.patch_all()
import urllib2
from gevent.pool import Pool
inputFile = open(sys.argv[1], 'r')
urls = []
counter = 0
for line in inputFile:
counter += 1
urls.append(line.strip())
inputFile.close()
outputDirectory = sys.argv[2]
def fetch(url):
try:
body = urllib2.urlopen("http://" + url, None, 5).read()
if len(body) > 0:
outputFile = open(outputDirectory + "/" + url, 'w')
outputFile.write(body)
outputFile.close()
print "Success", url
except:
pass
pool = Pool(int(sys.argv[3]))
pool.map(fetch, urls)
上面的行将内存中的全部内容作为字符串读取。要防止出现这种情况,请按如下所示更改fetch():
def fetch(url):
try:
u = urllib2.urlopen("http://" + url, None, 5)
try:
with open(outputDirectory + "/" + url, 'w') as outputFile:
while True:
chunk = u.read(65536)
if not chunk:
break
outputFile.write(chunk)
finally:
u.close()
print "Success", url
except:
print "Fail", url
听起来像是
gevent
中的内存泄漏。google forpython gevent memory leak
快速搜索会返回惊人的大量点击,尽管您可能更容易确定它们是否适用于您的特定情况。使用open(…)作为outputFile。。。而不是尝试
def fetch(url):
try:
u = urllib2.urlopen("http://" + url, None, 5)
try:
with open(outputDirectory + "/" + url, 'w') as outputFile:
while True:
chunk = u.read(65536)
if not chunk:
break
outputFile.write(chunk)
finally:
u.close()
print "Success", url
except:
print "Fail", url