Scrapy 在spider\u closed方法中调用parse方法
我想调用scrapy在scraper启动时调用的parse方法。是否可以在刮片完成后手动调用它Scrapy 在spider\u closed方法中调用parse方法,scrapy,Scrapy,我想调用scrapy在scraper启动时调用的parse方法。是否可以在刮片完成后手动调用它 from scrapy import signals from scrapy.xlib.pydispatch import dispatcher class MySpider(CrawlSpider): def __init__(self): dispatcher.connect(self.spider_closed, signals.spider_closed)
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
class MySpider(CrawlSpider):
def __init__(self):
dispatcher.connect(self.spider_closed, signals.spider_closed)
def parse(self, response):
# something here
def spider_closed(self, spider):
# CALL PARSE METHOD AGAIN
正如评论中所建议的,使用
spider\u idle
可能是您所需要的
下面是一个蜘蛛重新启动两次的示例:
import scrapy
class IdleRestartSpider(scrapy.Spider):
name = "idlerestart"
restarts = 0
max_restarts = 2
start_urls = ['http://httpbin.org/html']
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(IdleRestartSpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.idle, signal=scrapy.signals.spider_idle)
return spider
def parse(self, response):
self.logger.info("Got response %r" % response)
yield scrapy.Request('http://httpbin.org/get?restarts=%d' % self.restarts,
callback=self.parse_response)
def parse_response(self, response):
self.logger.info("Got response %r" % response)
def idle(self):
self.logger.info("Spider is idle: %d restarts left" % (
self.max_restarts - self.restarts))
if self.restarts < self.max_restarts:
self.logger.info("Spider is restarting")
self.restarts += 1
self.crawler.engine.crawl(
scrapy.Request(self.start_urls[0],
dont_filter=True),
self)
确保在必要时添加必要的
dont\u filter=True
参数。您可以使用此函数将控件传递给parse函数
return super(YourSpider, self).parse(response)
似乎不太可能,因为没有要传递给parse()的响应对象。你想达到什么目的?我想在刮片完成后重新启动刮片。请看一下这个描述如何从脚本运行刮片。如果你想重新启动爬网,请看一下中使用的
spider\u idle
信号。你能举个例子吗?
return super(YourSpider, self).parse(response)