我想我的python脚本有内存泄漏
这是我的代码:我想我的python脚本有内存泄漏,python,memory,memory-leaks,Python,Memory,Memory Leaks,这是我的代码: from xgoogle.search import GoogleSearch, SearchError import urllib, urllib2, sys, argparse global stringArr stringArr = ["string 1", "string 2", "string 3", "string e
from xgoogle.search import GoogleSearch, SearchError
import urllib, urllib2, sys, argparse
global stringArr
stringArr = ["string 1",
"string 2",
"string 3",
"string etc"]
def searchIt(url):
try:
if(args.verbose>='1'): print "[INFO] Opening URL: "+url
response = urllib.urlopen(url)
except urllib2.URLError, e:
print "[ERROR] "+e.reason
return False
except KeyboardInterrupt:
print "Suspended by user..."
sys.exit()
if(checkForStr(response.read())):
if(args.verbose=='0'): print "[INFO] String found in URL: "+url
else:
if(args.verbose>='1'): print "[INFO] No string found in URL: "+url
def checkForStr(html):
global stringArr
try:
if any(checkStr in html for checkStr in stringArr):
return True
else:
return False
except KeyboardInterrupt:
print "Suspended by user..."
sys.exit()
def main():
try:
i=0
gs = GoogleSearch(args.keyword)
gs.results_per_page = 100
results = []
while True:
tmp = gs.get_results()
i = i+1 # page number
if not tmp: # no more results (pages) were found
break
results.extend(tmp)
for r in results: # process results for page
searchIt(r.url) # check for string
del results[:] # clean results
# finished
except SearchError, e:
print "[ERROR] Search failed: %s" % e
except KeyboardInterrupt:
print "Suspended by user..."
sys.exit()
if __name__ == '__main__':
try:
parser = argparse.ArgumentParser()
parser.add_argument('-v', dest='verbose', default='0', help='Verbosity level', choices='012')
group = parser.add_argument_group('required arguments')
group.add_argument('-k', dest='keyword', help='Keyword to use on google query', required=True)
args = parser.parse_args()
main()
except KeyboardInterrupt:
print "Suspended by user..."
sys.exit()
我把它缩短了一点,使它更容易阅读,但它仍然应该是功能性的。此代码将是更大脚本的一部分
我正在使用这个库:从google中抓取结果,然后访问每个结果以搜索网站是否包含stringArr中的任何字符串
我做了第一次测试,没有任何问题(在不到10个结果后,我按ctrl+C组合键),但第一次让它运行时,在测试了大约100个URL后,我得到了以下错误:
File "./StringScan.py", line 99, in <module>
main()
File "./StringScan.py", line 83, in main
checkForStr(r.url)
File "./StringScan.py", line 39, in checkForStr
response = urllib.urlopen(url)
File "/usr/lib/python2.6/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/lib/python2.6/urllib.py", line 205, in open
return getattr(self, name)(url)
File "/usr/lib/python2.6/urllib.py", line 344, in open_http
h.endheaders()
File "/usr/lib/python2.6/httplib.py", line 904, in endheaders
self._send_output()
File "/usr/lib/python2.6/httplib.py", line 776, in _send_output
self.send(msg)
File "/usr/lib/python2.6/httplib.py", line 735, in send
self.connect()
File "/usr/lib/python2.6/httplib.py", line 716, in connect
self.timeout)
File "/usr/lib/python2.6/socket.py", line 500, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno -2] Name or service not known
文件“/StringScan.py”,第99行,在
main()
文件“/StringScan.py”,第83行,主目录
checkForStr(r.url)
checkForStr中第39行的文件“/StringScan.py”
response=urllib.urlopen(url)
文件“/usr/lib/python2.6/urllib.py”,第86行,在urlopen中
返回opener.open(url)
文件“/usr/lib/python2.6/urllib.py”,第205行,打开
返回getattr(self,name)(url)
open_http中的文件“/usr/lib/python2.6/urllib.py”,第344行
h、 endheaders()
文件“/usr/lib/python2.6/httplib.py”,第904行,在endheaders中
self.\u发送\u输出()
文件“/usr/lib/python2.6/httplib.py”,第776行,在发送输出中
self.send(msg)
文件“/usr/lib/python2.6/httplib.py”,第735行,在send中
self.connect()
文件“/usr/lib/python2.6/httplib.py”,第716行,在connect中
自我保护(超时)
文件“/usr/lib/python2.6/socket.py”,第500行,在create_connection中
对于getaddrinfo(主机、端口、0、SOCK_流)中的res:
IOError:[Errno套接字错误][Errno-2]名称或服务未知
(行号不一样,因为我修改了代码将其发布在这里)
在那之后,我回到了我的linux终端,就像脚本已经完成一样。但我注意到我的电脑工作得不太好,我检查了系统监视器,看到Python进程使用了1.3gb内存,我不得不停止进程以使我的电脑恢复正常
是我的代码中的某些内容导致了这种情况还是为什么会发生这种情况?
我知道我的代码可能会有一些错误,但现在我主要对可能导致内存问题的任何错误感兴趣。任何帮助都将不胜感激。我对您的代码进行了一些重构,以使其更易于阅读。但我在这里看不到任何会泄露内存的东西
from itertools import count
import urllib, urllib2, sys, argparse
from xgoogle.search import GoogleSearch, SearchError
stringArr = ["string 1",
"string 2",
"string 3",
"string etc"]
def searchIt(url):
try:
if(args.verbose>='1'):
print "[INFO] Opening URL: "+url
response = urllib.urlopen(url)
except urllib2.URLError, e:
print "[ERROR] "+e.reason
return False
if checkForStr(response.read()):
if(args.verbose=='0'):
print "[INFO] String found in URL: "+url
else:
if(args.verbose>='1'):
print "[INFO] No string found in URL: "+url
def checkForStr(html):
return any(checkStr in html for checkStr in stringArr)
def main():
try:
gs = GoogleSearch(args.keyword)
gs.results_per_page = 100
for i in count():
results = gs.get_results()
if not results: # no more results (pages) were found
break
for r in results: # process results for page
searchIt(r.url) # check for string
# finished
except SearchError, e:
print "[ERROR] Search failed: %s" % e
if __name__ == '__main__':
try:
parser = argparse.ArgumentParser()
parser.add_argument('-v', dest='verbose', default='0', help='Verbosity level', choices='012')
group = parser.add_argument_group('required arguments')
group.add_argument('-k', dest='keyword', help='Keyword to use on google query', required=True)
args = parser.parse_args()
main()
except KeyboardInterrupt:
print "Suspended by user..."
sys.exit()
它可以是urllib.urlopen()。请参见如果x:return True\else:return False-很高兴我们得到了这些布尔值,嗯?
global stringar
不做您认为它做的事情,您根本不需要这些行您不需要到处处理键盘中断
,异常会渗透回顶层,因此只需在那里处理它,我添加了如此多的KeyboardInterrupt,因为如果我只在main()中使用它,并且脚本是在.urlopen上,它不会立即关闭,但是它使用了所有的KeyboardInterrupt。Relet,我不理解你的评论。我相信Relet是指你在checkForStr
中使用any
。我还不确定,但我认为泄漏来自我运行的另一个python应用程序,因为我的脚本显示为scriptname.py,而高内存使用率是在一个名为python的进程上。我的脚本中仍然存在IOError错误,但我想这是另一个问题。你知道我如何删除这个问题吗?