Python 刮片完成后如何自动重启刮片

Python 刮片完成后如何自动重启刮片,python,scrapy,Python,Scrapy,我尝试在刮取完成时自动重新启动爬行器,尤其是在响应状态不好时。 例如,我有以下代码: #!/usr/bin/python -tt # -*- coding: utf-8 -*- from scrapy.selector import Selector from scrapy.contrib.spiders import CrawlSpider from scrapy.http import Request from urlparse import urljoin from bs4 import

我尝试在刮取完成时自动重新启动爬行器,尤其是在响应状态不好时。 例如,我有以下代码:

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-

from scrapy.selector import Selector
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import Request
from urlparse import urljoin
from bs4 import BeautifulSoup
from scrapy.spider import BaseSpider
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
from datetime import datetime
import re

class level1(BaseSpider):
    # Crawling Start
    CrawlSpider.started_on = datetime.now()

    name = "level1"
    base_domain = 'http://www.google.com'

    DOWNLOAD_DELAY = 3

    restart=False

    handle_httpstatus_list = [404, 302, 503, 999, 200] #add any other code you need

    # Call sendEmail class
    email = sendEmail()


    # Call log settings
    saveLog = runlog()


    # Init
    def __init__(self, url='', child='', parent=''):
        self.start_urls = [url]
        self.child = child
        self.parent = parent

        #run baby, run :)
        super(level1, self).__init__(self.start_urls)


        # On Spider Closed
        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, reason):
        if self.restart:
            print "we need to retry"
            super(level1, self).__init__(self.start_urls)
        else:
            print "ok"
            # parsing time
            work_time = datetime.now() - CrawlSpider.started_on

            # Correct Finished
            if reason == "finished":
                print "finished"

    def parse(self, response):

        if response.status == 503:
            self.restart = True
        if response.status == 999:
            self.restart = True


        if str(response.status) == "200":
            # Selector
            sel = Selector(response)
            todo
在spider_closed方法中,我尝试在响应状态不好时重新启动spider,但它不起作用


如何解决此问题?

我不确定调用init是否会重新启动您的spider

请查看此链接:


在最坏的情况下,您可以编写一个单独的程序,使用此核心API(从链接)生成爬虫程序,并根据需要重新启动。虽然我同意在spider脚本中重新启动会简单得多。

我不确定调用init是否会重新启动spider

请查看此链接:

在最坏的情况下,您可以编写一个单独的程序,使用此核心API(从链接)生成爬虫程序,并根据需要重新启动。虽然我同意在spider脚本中重新启动要简单得多