Python 刮片完成后如何自动重启刮片
我尝试在刮取完成时自动重新启动爬行器,尤其是在响应状态不好时。 例如,我有以下代码:Python 刮片完成后如何自动重启刮片,python,scrapy,Python,Scrapy,我尝试在刮取完成时自动重新启动爬行器,尤其是在响应状态不好时。 例如,我有以下代码: #!/usr/bin/python -tt # -*- coding: utf-8 -*- from scrapy.selector import Selector from scrapy.contrib.spiders import CrawlSpider from scrapy.http import Request from urlparse import urljoin from bs4 import
#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
from scrapy.selector import Selector
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import Request
from urlparse import urljoin
from bs4 import BeautifulSoup
from scrapy.spider import BaseSpider
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
from datetime import datetime
import re
class level1(BaseSpider):
# Crawling Start
CrawlSpider.started_on = datetime.now()
name = "level1"
base_domain = 'http://www.google.com'
DOWNLOAD_DELAY = 3
restart=False
handle_httpstatus_list = [404, 302, 503, 999, 200] #add any other code you need
# Call sendEmail class
email = sendEmail()
# Call log settings
saveLog = runlog()
# Init
def __init__(self, url='', child='', parent=''):
self.start_urls = [url]
self.child = child
self.parent = parent
#run baby, run :)
super(level1, self).__init__(self.start_urls)
# On Spider Closed
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, reason):
if self.restart:
print "we need to retry"
super(level1, self).__init__(self.start_urls)
else:
print "ok"
# parsing time
work_time = datetime.now() - CrawlSpider.started_on
# Correct Finished
if reason == "finished":
print "finished"
def parse(self, response):
if response.status == 503:
self.restart = True
if response.status == 999:
self.restart = True
if str(response.status) == "200":
# Selector
sel = Selector(response)
todo
在spider_closed方法中,我尝试在响应状态不好时重新启动spider,但它不起作用
如何解决此问题?我不确定调用init是否会重新启动您的spider 请查看此链接:
在最坏的情况下,您可以编写一个单独的程序,使用此核心API(从链接)生成爬虫程序,并根据需要重新启动。虽然我同意在spider脚本中重新启动会简单得多。我不确定调用init是否会重新启动spider 请查看此链接: 在最坏的情况下,您可以编写一个单独的程序,使用此核心API(从链接)生成爬虫程序,并根据需要重新启动。虽然我同意在spider脚本中重新启动要简单得多