Python Scrapy:当我得到403/302时,如何使用中间件更改新代理?

Python Scrapy:当我得到403/302时,如何使用中间件更改新代理?,python,scrapy,web-crawler,Python,Scrapy,Web Crawler,我已经写下了中间件: def process_request(self, request, spider): request = self.change_proxy(request) # set request proxy def process_response(self, request, response, spider): if response.status != 200: self.delete_proxy() # remov

我已经写下了中间件:

def process_request(self, request, spider):
    request = self.change_proxy(request)   # set request proxy

def process_response(self, request, response, spider):
    if response.status != 200:
        self.delete_proxy()           # remove unusable proxy
        return request.copy()         # send request to process_request to change proxy
    return response

def process_exception(self, request, exception, spider):
    self.delete_proxy()              # remove unusable proxy
    # if comment the code below, i will not get 302. but maybe i will lost crawl some webpage?
    return request.copy()            # send request to process_request to change proxy
首先,我在redis中有一些代理,我想做的是:

  • 给每个请求一个随机代理
  • 当与代理的连接失败时(调用process_exception),请更改新代理并重新爬网网页
  • 当代理被网站禁用(response.status!=200)时,请更改新代理并重新抓取网页
  • 但是,当我运行spider几分钟时,总是会调用302或
    process\u异常
    ,为什么?如果我重新启动我的爬行器,它只能在最初几分钟内正常工作。。。(那么代理ip是正常的,我的代码有什么问题?)
    如何正确执行此操作?

    请显示您获得的异常的内容。此外,您必须使用
    request.replace(dont\u filter=True)
    绕过重复项过滤器