Scrapy 在CustomDownloaderMiddware中引发IgnoreRequest无法正常工作

Scrapy 在CustomDownloaderMiddware中引发IgnoreRequest无法正常工作,scrapy,scrapy-middleware,Scrapy,Scrapy Middleware,我已经编写了自己的scrapy下载中间件来简单地检查数据库中的exist request.url,如果是这样,则引发IgnoreRequestf def process_request(self, request, spider): # Called for each request that goes through the downloader # middleware. # Must either: # - return N

我已经编写了自己的scrapy下载中间件来简单地检查数据库中的exist request.url,如果是这样,则引发IgnoreRequestf

def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called

        sql = """SELECT url FROM domain_sold WHERE url = %s;"""

        try:

            cursor = spider.db_connection.cursor()
            cursor.execute(sql, (request.url,)) 

            is_seen = cursor.fetchone()
            cursor.close()
            if is_seen:
                raise IgnoreRequest('duplicate url {}'.format(request.url))

        except (Exception, psycopg2.DatabaseError) as error:
            self.logger.error(error)

        return None
如果引发IgnoreRequest,我希望爬行器将继续处理另一个请求,但在我的情况下,爬行器仍将继续抓取该请求,并通过我的自定义管道通过该项

我目前的dl mw设置如下所示

“下载器\中间产品”:{ 'realestate.middleware.RealestateDownloaderMiddleware':99


任何人都可以解释为什么会发生这种情况。感谢

IgnoreRequest
继承自基本的
异常
类,然后您会立即在
中捕获该异常并进行日志记录,这样它就不会传播到足以忽略请求的程度

更改:

except (Exception, psycopg2.DatabaseError) as error:
致:


这是正确的,但更好的简明答案是删除try/except,因为
进程_请求
应该是:返回None、返回Response对象、返回request对象或引发IgnoreRequest(即无需捕获错误)@wishmaster这将意味着任何数据库异常都将丢失,并且不会显式记录…看起来上面的操作始终不会返回None或引发IgnoreRequest…(任何其他可能发生的异常都会失败…)看起来OP想要记录DB异常,而不是让它们传播,但是在他们的Exception中有一个相当广泛的
异常
,有点过分热心了clause@JonClements谢谢。你的解决方案解决了我的问题
except psycopg2.DatabaseError as error: