Python Scrapy:当我得到403/302时,如何使用中间件更改新代理?
我已经写下了中间件:Python Scrapy:当我得到403/302时,如何使用中间件更改新代理?,python,scrapy,web-crawler,Python,Scrapy,Web Crawler,我已经写下了中间件: def process_request(self, request, spider): request = self.change_proxy(request) # set request proxy def process_response(self, request, response, spider): if response.status != 200: self.delete_proxy() # remov
def process_request(self, request, spider):
request = self.change_proxy(request) # set request proxy
def process_response(self, request, response, spider):
if response.status != 200:
self.delete_proxy() # remove unusable proxy
return request.copy() # send request to process_request to change proxy
return response
def process_exception(self, request, exception, spider):
self.delete_proxy() # remove unusable proxy
# if comment the code below, i will not get 302. but maybe i will lost crawl some webpage?
return request.copy() # send request to process_request to change proxy
首先,我在redis中有一些代理,我想做的是:
process\u异常
,为什么?如果我重新启动我的爬行器,它只能在最初几分钟内正常工作。。。(那么代理ip是正常的,我的代码有什么问题?)
如何正确执行此操作?请显示您获得的异常的内容。此外,您必须使用
request.replace(dont\u filter=True)
绕过重复项过滤器