Python 带爬虫过程的刮擦无限循环

Python 带爬虫过程的刮擦无限循环,python,recursion,scrapy,infinite,Python,Recursion,Scrapy,Infinite,我目前正在运行ScrapyV2.5,我想运行无限循环。我的代码: class main(): def bucle(self, array_spyder, process): mongo = mongodb(setting) for spider_name in array_spider: process_init.crawl(spider_name, params={ "mongo": mongo,

我目前正在运行ScrapyV2.5,我想运行无限循环。我的代码:

class main():

    def bucle(self, array_spyder, process):
        mongo       = mongodb(setting)
        for spider_name in array_spider:
            process_init.crawl(spider_name, params={ "mongo": mongo, "spider_name": spider_name})
        process.start()
        process.stop()
        mongo.close_mongo()

if __name__ == "__main__":
    setting     = get_project_settings()
    while True:
        process = CrawlerProcess(setting)
        array_spider = process.spider_loader.list()
        class_main = main()
        class_main.bucle(array_spider, process)
但这导致了如下错误消息:

Traceback (most recent call last):
  File "run_scrapy.py", line 92, in <module>
    process.start()
  File "/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py", line 327, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1422, in run
    self.startRunning(installSignalHandlers=installSignalHandlers)
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1404, in startRunning
    ReactorBase.startRunning(cast(ReactorBase, self))
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 843, in startRunning
    raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
回溯(最近一次呼叫最后一次):
文件“run_scrapy.py”,第92行,在
process.start()
文件“/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py”,第327行,开始
reactor.run(installSignalHandlers=False)#阻止调用
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”,第1422行,运行中
self.startRunning(installSignalHandlers=installSignalHandlers)
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”,第1404行,在startRunning中
反应器底座。开始耳轴加工(铸造(反应器底座,自身))
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”,第843行,在startRunning中
引发错误。ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable

有人能帮我吗?

好的,重新启动spider没有简单的方法,但是有一种替代方法——spider永远不会关闭。为此,您可以利用

根据文件:

Sent when a spider has gone idle, which means the spider has no further:  
* requests waiting to be downloaded
* requests scheduled
* items being processed in the item pipeline

你也可以在官方网站上找到使用
信号的例子。

如果你使用
Linux
,那么也许你应该使用
cron
每隔几分钟启动一次。我不确定,但这可能会在短时间内启动许多爬行器,并且会产生问题。你可以使用
print()
查看中的哪些值有问题。您应该检查它在第一次运行或第二次运行时是否有问题-在上一次
进程.stop()之后再次执行
进程.start()
。可能所有问题都会导致
process.stop()
,这可能会杀死所有进程,并且无法再次启动。我遇到了相同的问题,我用下面的问题解决了它