Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
xpath()导致内存泄漏_Xpath_Memory Leaks_Scrapy - Fatal编程技术网

xpath()导致内存泄漏

xpath()导致内存泄漏,xpath,memory-leaks,scrapy,Xpath,Memory Leaks,Scrapy,我发现response.xpath()方法在使用scrapy编写爬行器时泄漏内存。代码如下: def extract_data(self, response): aomen_host_water = None aomen_pankou = None aomen_guest_water = None sb_host_water = None sb_pankou = None sb_guest_water = None # response

我发现response.xpath()方法在使用scrapy编写爬行器时泄漏内存。代码如下:

def extract_data(self, response):
    aomen_host_water = None
    aomen_pankou = None
    aomen_guest_water = None
    sb_host_water = None
    sb_pankou = None
    sb_guest_water = None


    # response.xpath('//div[@id="webmain"]/table[@id="odds"]/tr')
    # for tr in all_trs:
    #     # cname(company name)
    #     cname = tr.xpath('td[1]/text()').extract()
    #     if len(cname) == 0:
    #         continue
    #     # remove extra space and other stuff
    #     cname = cname[0].split(' ')[0]
    #     if cname == u'澳彩':
    #         aomen_host_water = tr.xpath('td[9]/text()').extract()
    #         if len(aomen_host_water) != 0:
    #             aomen_pankou = tr.xpath('td[10]/text()').extract()
    #             aomen_guest_water = tr.xpath('td[11]/text()').extract()
    #         else:
    #             aomen_host_water = tr.xpath('td[6]/text()').extract()
    #             aomen_pankou = tr.xpath('td[7]/text()').extract()
    #             aomen_guest_water = tr.xpath('td[8]/text()').extract()
    #     elif cname == u'SB':
    #         sb_host_water = tr.xpath('td[9]/text()').extract()
    #         if len(sb_host_water) != 0:
    #             sb_pankou = tr.xpath('td[10]/text()').extract()
    #             sb_guest_water = tr.xpath('td[11]/text()').extract()
    #         else:
    #             sb_host_water = tr.xpath('td[6]/text()').extract()
    #             sb_pankou = tr.xpath('td[7]/text()').extract()
    #             sb_guest_water = tr.xpath('td[8]/text()').extract()
    # if (aomen_host_water is None) or (aomen_pankou is None) or (aomen_guest_water is None) or \
    #         (sb_host_water is None) or (sb_pankou is None) or (sb_guest_water is None):
    #     return None
    # if (len(aomen_host_water) == 0) or (len(aomen_pankou) == 0) or (len(aomen_guest_water) == 0) or \
    #         (len(sb_host_water) == 0) or (len(sb_pankou) == 0) or (len(sb_guest_water) == 0):
    #     return None
    # item = YPItem()
    # item['aomen_host_water'] = float(aomen_host_water[0])
    # item['aomen_pankou'] = aomen_pankou[0].encode('utf-8')  # float(pankou.pankou2num(aomen_pankou[0]))
    # item['aomen_guest_water'] = float(aomen_guest_water[0])
    # item['sb_host_water'] = float(sb_host_water[0])
    # item['sb_pankou'] = sb_pankou[0].encode('utf-8') # float(pankou.pankou2num(sb_pankou[0]))
    # item['sb_guest_water'] = float(sb_guest_water[0])

    item = YPItem()
    item['aomen_host_water'] = 1.0
    item['aomen_pankou'] = '111'  # float(pankou.pankou2num(aomen_pankou[0]))
    item['aomen_guest_water'] = 1.0
    item['sb_host_water'] = 1.0
    item['sb_pankou'] = '111' # float(pankou.pankou2num(sb_pankou[0]))
    item['sb_guest_water'] = 1.0
    return item

在这里,我对有用的语句进行了注释,并使用了伪数据,spider使用了大约45M内存,当我取消注释注释行时,spider使用了100+M内存,内存使用率不断上升。以前有人遇到过这种问题吗?

您可以通过切换到
extract\u first()
而不是
extract()
来减少内存使用,这样会创建不必要的列表

我还将
scrapy
lxml
升级到最新版本:

pip install --upgrade scrapy
pip install --upgrade lxml

您可以通过切换到
extract\u first()
而不是
extract()
来减少内存使用,这将创建不必要的列表

我还将
scrapy
lxml
升级到最新版本:

pip install --upgrade scrapy
pip install --upgrade lxml

我试着取消注释第一行,蜘蛛的内存使用率仍然很高,而且还在上升continuously@bob当然,你看到页面了吗?是的,我看到页面并使用了prefs()和guppy的hpy.heap(),我没有看到内存泄漏。我对python不熟悉,但是,大多数变量都是本地的,并且没有交叉引用。我不知道为什么还有内存泄漏。@bob嘿,伙计,我也有同样的问题。你解决了吗?我试着取消第一行的注释,蜘蛛的内存使用率仍然很高,而且还在上升continuously@bob当然,你看到页面了吗?是的,我看到页面并使用了prefs()和guppy的hpy.heap(),我没有看到内存泄漏。我对python不熟悉,但是,大多数变量都是本地的,并且没有交叉引用。我不知道为什么还有内存泄漏。@bob嘿,伙计,我也有同样的问题。你设法修好了吗?