Python Scrapy使用相同的爬行器和不同深度的用户输入重新爬网相同的URL
如果我在一个方法中多次调用此代码,它将失败,但终端中不会显示任何错误。它只运行一次。不可能用同一个蜘蛛重新爬网两次吗? 它在reactor.run行失败,spider不会在第二次调用时运行,但日志中没有错误Python Scrapy使用相同的爬行器和不同深度的用户输入重新爬网相同的URL,python,scrapy,web-crawler,Python,Scrapy,Web Crawler,如果我在一个方法中多次调用此代码,它将失败,但终端中不会显示任何错误。它只运行一次。不可能用同一个蜘蛛重新爬网两次吗? 它在reactor.run行失败,spider不会在第二次调用时运行,但日志中没有错误 def crawlSite(self): self.mySpider = MySpider() self.mySpider.setCrawlFolder(self.website) settings = get_project_settings() set
def crawlSite(self):
self.mySpider = MySpider()
self.mySpider.setCrawlFolder(self.website)
settings = get_project_settings()
settings.set('DEPTH_LIMIT', self.depth)
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(self.mySpider)
crawler.start()
log.start(logfile="results.log", loglevel=log.ERROR, crawler=crawler, logstdout=False) #log.DEBUG
reactor.run() # the script will block here until the spider_closed signal was sent
这是MySpider类
class MySpider(CrawlSpider):
name = "mysite"
crawlFolder = ""
crawlFolder1 = ""
crawlFolder2 = ""
allowed_domains = ["mysite.ca"]
start_urls = [ "http://www.mysite.ca" ]
rules = [ Rule(SgmlLinkExtractor(allow=(r'^http://www.mysite.ca/',), unique=True), callback='parse_item', follow=True), ]
def parse_item(self, response):
#store data in a website item object
item = WebsiteClass()
item['title'] = response.selector.xpath('//title/text()').extract()
item['body'] = response.selector.xpath('//body').extract()
item['url'] = response.url
...
然后我有一个SetupClass,它在CrawlerClass中调用crawlSite
self.crawlerClass.crawlSite()
我们需要完整的代码来查看更多关闭信号是否工作?也许你应该通过调度器来实现它?检查