pyqt4-QWebView.load（url）泄漏内存（不是来自python）_Python_Qt_Memory Leaks_Pyqt_Qwebview

pyqt4-QWebView.load（url）泄漏内存（不是来自python）

python qt memory-leaks

pyqt4-QWebView.load（url）泄漏内存（不是来自python）,python,qt,memory-leaks,pyqt,qwebview,Python,Qt,Memory Leaks,Pyqt,Qwebview,基本上，我从我的数据库中提取一系列链接，并希望从中获取我正在寻找的特定链接。然后，我将这些链接重新馈送到我的链接队列中，我的多个QWebView引用了这些链接队列，它们将继续向下拉这些链接以进行处理/存储我的问题是，由于这是为。。。比如说200或500个链接，它开始使用越来越多的RAM 我已经详尽地研究了这个问题，使用heapy、memory_profiler和objgraph找出了导致内存泄漏的原因。。。python堆的对象在数量和大小上随时间保持不变。这使我认为C++对象没有被移除。果然，

基本上，我从我的数据库中提取一系列链接，并希望从中获取我正在寻找的特定链接。然后，我将这些链接重新馈送到我的链接队列中，我的多个QWebView引用了这些链接队列，它们将继续向下拉这些链接以进行处理/存储

我的问题是，由于这是为。。。比如说200或500个链接，它开始使用越来越多的RAM

我已经详尽地研究了这个问题，使用heapy、memory_profiler和objgraph找出了导致内存泄漏的原因。。。python堆的对象在数量和大小上随时间保持不变。这使我认为C++对象没有被移除。果然，使用内存分析器，只有在调用self.load（self.url）行代码时，RAM才会上升。我试图解决这个问题，但没有用
代码：
以下是我的记忆工具所说的（它们都是关于在整个程序中保持不变的）

RAM:resource.getrusage（resource.RUSAGE\u SELF）.ru\u maxrss/1024
2491（MB）

Objgraph最常见的类型：
方法描述符9959
功能8342
威克雷夫6440
元组6418
第4982条
包装器描述符4380
getset_描述符2314
名单1890
方法u描述符1445
内置函数或方法1298

希皮：
一组9879个对象的分区。总大小=1510000字节
索引计数%Size%累计%Kind（类/类的目录）

< >由于C++代码，程序确实正在经历增长，但从不再引用的对象的创建来看，它并不是真正的漏洞。现在发生的事情，至少部分是，您的QWebView拥有一个QWebPage，其中包含一个QWebHistory（）。每次调用self.load时，历史记录都会变长
请注意，QWebHistory有一个clear（）函数

文档可用：
Qt4已过时-官方支持已于两年前结束。像这样的问题是不可能解决的。您需要切换到PyQt5并使用web引擎。我担心这是Qt的基础。。Qt5也会有同样的问题。但你是对的。我很快就会切换，我已经厌倦了修复这个问题。好吧，你有一个测试用例——你需要做的就是将它移植到pyqt5并运行它。当然，这可能仍然存在问题，但至少在某个时候解决问题的可能性要大得多。您还应该注意，pyqt4的下一个版本——可能是4.12.2，或者可能是4.13——将是最后一个版本。因此，如果您可以切换到pyqt5，那么无论web视图有什么问题，您都应该尽早执行，我看到您的示例似乎部分基于。但是，您对它进行了一些更改，这可能是问题的原因。特别是，您使用的是
QWebView
类，而不是
QWebPage
，并且在列表中保留对它的多个实例的引用。我假设这是因为您希望并行处理URL。因此，一个明显的问题是：如果您只使用一个实例来处理所有URL，那么内存使用情况是否相同？嘿，我尝试将此添加到我的代码中。我觉得有点帮助？（在页面被处理后清除），但是，RAM继续构建。当我添加更多URL时，它可以获得高达6 Gig的容量：（我忍不住想，一定是有其他东西在后台构建。你能展示你添加的确切行吗？另外，我鼓励你遵循ekhumoro的建议，因为追踪一个在PyQt4代码中不再受支持的问题是不好的。如果你有兴趣尝试的话根据您的流程中的实时核心，我可以帮助您找出6场演出的来源，但在不久的将来，您可能会更好地遵循Ekhumaro的建议。
from PyQt4.QtCore import QUrl from PyQt4.QtWebKit import QWebView, QWebSettings from PyQt4.QtGui import QApplication from lxml.etree import HTMLParser # My functions from util import dump_list2queue, parse_doc class ThreadFlag: def __init__(self, threads, jid, db): self.threads = threads self.job_id = jid self.db_direct = db self.xml_parser = HTMLParser() class WebView(QWebView): def __init__(self, thread_flag, id_no): super(QWebView, self).__init__() self.loadFinished.connect(self.handleLoadFinished) self.settings().globalSettings().setAttribute(QWebSettings.AutoLoadImages, False) # This is actually a dict with a few additional details about the url we want to pull self.url = None # doing one instance of this to avoid memory leaks self.qurl = QUrl() # id of the webview instance self.id = id_no # Status webview instance, green mean it isn't working and yellow means it is. self.status = 'GREEN' # Reference to a single universal object all the webview instances can see. self.thread_flag = thread_flag def handleLoadFinished(self): try: self.processCurrentPage() except Exception as e: print e self.status = 'GREEN' if not self.fetchNext(): # We're finished! self.loadFinished.disconnect() self.stop() else: # We're not finished! Do next url. self.qurl.setUrl(self.url['url']) self.load(self.qurl) def processCurrentPage(self): self.frame = str(self.page().mainFrame().toHtml().toUtf8()) # This is the case for the initial web pages I want to gather links from. if 'name' in self.url: # Parse html string for links I'm looking for. new_links = parse_doc(self.thread_flag.xml_parser, self.url, self.frame) if len(new_links) == 0: return 0 fkid = self.url['pkid'] new_links = map(lambda x: (fkid, x['title'],x['url'], self.thread_flag.job_id), new_links) # Post links to database, db de-dupes and then repull ones that made it. self.thread_flag.db_direct.post_links(new_links) added_links = self.thread_flag.db_direct.get_links(self.thread_flag.job_id,fkid) # Add the pulled links to central queue all the qwebviews pull from dump_list2queue(added_links, self._urls) del added_links else: # Process one of the links I pulled from the initial set of data that was originally in the queue. print "Processing target link!" # Get next url from the universal queue! def fetchNext(self): if self._urls and self._urls.empty(): self.status = 'GREEN' return False else: self.status = 'YELLOW' self.url = self._urls.get() return True def start(self, urls): # This is where the reference to the universal queue gets made. self._urls = urls if self.fetchNext(): self.qurl.setUrl(self.url['url']) self.load(self.qurl) # uq = central url queue shared between webview instances # ta = array of webview objects # tf - thread flag (basically just a custom universal object that all the webviews can access). # This main "program" is started by another script elsewhere. def main_program(uq, ta, tf): app = QApplication([]) webviews = ta threadflag = tf tf.app = app print "Beginning the multiple async web calls..." # Create n "threads" (really just webviews) that each will make asynchronous calls. for n in range(0,threadflag.threads): webviews.append(WebView(threadflag, n+1)) webviews[n].start(uq) app.exec_()

0 2646 27 445216 29 445216 29 str 1 563 6 262088 17 707304 47 dict (no owner) 2 2267 23 199496 13 906800 60 __builtin__.weakref 3 2381 24 179128 12 1085928 72 tuple 4 212 2 107744 7 1193672 79 dict of guppy.etc.Glue.Interface 5 50 1 52400 3 1246072 83 dict of guppy.etc.Glue.Share 6 121 1 40200 3 1286272 85 list 7 116 1 32480 2 1318752 87 dict of guppy.etc.Glue.Owner 8 240 2 30720 2 1349472 89 types.CodeType 9 42 0 24816 2 1374288 91 dict of class