Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ionic-framework/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BloomFilter在10分钟后达到容量_Python_Web Scraping_Scrapy_Bloom Filter - Fatal编程技术网

Python BloomFilter在10分钟后达到容量

Python BloomFilter在10分钟后达到容量,python,web-scraping,scrapy,bloom-filter,Python,Web Scraping,Scrapy,Bloom Filter,我正在使用Scrapy和BloomFilter一起使用,10分钟后,我在循环中出现以下错误: 2016-10-03 18:03:34 [twisted] CRITICAL: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/twisted/internet/task.py", line 517, in _oneWorkUnit result = next(self._ite

我正在使用ScrapyBloomFilter一起使用,10分钟后,我在循环中出现以下错误:

2016-10-03 18:03:34 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/task.py", line 517, in _oneWorkUnit
    result = next(self._iterator)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 63, in <genexpr>
    work = (callable(elem, *args, **named) for elem in iterable)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 183, in _process_spidermw_output
    self.crawler.engine.crawl(request=output, spider=spider)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 209, in crawl
    self.schedule(request, spider)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 215, in schedule
    if not self.slot.scheduler.enqueue_request(request):
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scheduler.py", line 54, in enqueue_request
    if not request.dont_filter and self.df.request_seen(request):
  File "dirbot/custom_filters.py", line 20, in request_seen
    self.fingerprints.add(fp)
  File "/usr/local/lib/python2.7/dist-packages/pybloom/pybloom.py", line 182, in add
    raise IndexError("BloomFilter is at capacity")
IndexError: BloomFilter is at capacity
我在谷歌上搜索各种可能性,但没有任何效果。
感谢您的帮助。

使用而不是
BloomFilter

from pybloom import ScalableBloomFilter
from scrapy.utils.job import job_dir
from scrapy.dupefilters import BaseDupeFilter

class BLOOMDupeFilter(BaseDupeFilter):
    """Request Fingerprint duplicates filter"""

    def __init__(self, 
                 path=None, 
                 initial_capacity=2000000, 
                 error_rate=0.00001,
                 mode=ScalableBloomFilter.SMALL_SET_GROWTH):
        self.file = None
        self.fingerprints = ScalableBloomFilter(
            initial_capacity, error_rate, mode)
使用而不是
BloomFilter

from pybloom import ScalableBloomFilter
from scrapy.utils.job import job_dir
from scrapy.dupefilters import BaseDupeFilter

class BLOOMDupeFilter(BaseDupeFilter):
    """Request Fingerprint duplicates filter"""

    def __init__(self, 
                 path=None, 
                 initial_capacity=2000000, 
                 error_rate=0.00001,
                 mode=ScalableBloomFilter.SMALL_SET_GROWTH):
        self.file = None
        self.fingerprints = ScalableBloomFilter(
            initial_capacity, error_rate, mode)

我需要像我以前的代码一样添加@classmethod吗?@Pixel,添加你想要的任何东西!我需要像我以前的代码一样添加@classmethod吗?@Pixel,添加你想要的任何东西!