Python 如何检查;HTTP状态代码未处理或不允许”;用刮痧?
我正在使用,我想在输入parse方法之前检查状态代码 我的代码如下所示:Python 如何检查;HTTP状态代码未处理或不允许”;用刮痧?,python,proxy,web-scraping,scrapy,tor,Python,Proxy,Web Scraping,Scrapy,Tor,我正在使用,我想在输入parse方法之前检查状态代码 我的代码如下所示: class mywesbite(BaseSpider): # Crawling Start CrawlSpider.started_on = datetime.now() # CrawlSpider name = 'mywebsite' DOWNLOAD_DELAY = 10 allowed_domains = ['mywebsite.com'] pathUrl
class mywesbite(BaseSpider):
# Crawling Start
CrawlSpider.started_on = datetime.now()
# CrawlSpider
name = 'mywebsite'
DOWNLOAD_DELAY = 10
allowed_domains = ['mywebsite.com']
pathUrl = "URL/mywebsite.txt"
# Init
def __init__(self, local = None, *args, **kwargs):
# Heritage
super(mywebsite, self).__init__(*args, **kwargs)
# On Spider Closed
dispatcher.connect(self.spider_closed, signals.spider_closed)
def start_requests(self):
return [ Request(url = start_url) for start_url in [l.strip() for l in open(self.pathUrl).readlines()] ]
def parse(self, response):
print "==============="
print response.headers
print "==============="
# Selector
sel = Selector(response)
当我的代理未被阻止时,我会看到响应头,但当我的IP被阻止时,我只会在输出控制台中看到:
调试:忽略响应:HTTP状态
代码未被处理或不允许
如何在进入解析方法之前检查响应头
编辑:
答:当蜘蛛被防爬行系统阻止/禁止时,会出现此错误。您必须使用未被阻止的代理系统。谢谢您的“自动回复”,但在我的代码中,我已准备好使用该系统。请检查这个非常类似的问题。如果你还需要帮助,说“请”,我会检查链接