Python 带爬虫过程的刮擦无限循环_Python_Recursion_Scrapy_Infinite

Python 带爬虫过程的刮擦无限循环

python recursion scrapy

Python 带爬虫过程的刮擦无限循环,python,recursion,scrapy,infinite,Python,Recursion,Scrapy,Infinite,我目前正在运行ScrapyV2.5，我想运行无限循环。我的代码： class main(): def bucle(self, array_spyder, process): mongo = mongodb(setting) for spider_name in array_spider: process_init.crawl(spider_name, params={ "mongo": mongo,

我目前正在运行ScrapyV2.5，我想运行无限循环。我的代码：

class main():

    def bucle(self, array_spyder, process):
        mongo       = mongodb(setting)
        for spider_name in array_spider:
            process_init.crawl(spider_name, params={ "mongo": mongo, "spider_name": spider_name})
        process.start()
        process.stop()
        mongo.close_mongo()

if __name__ == "__main__":
    setting     = get_project_settings()
    while True:
        process = CrawlerProcess(setting)
        array_spider = process.spider_loader.list()
        class_main = main()
        class_main.bucle(array_spider, process)

但这导致了如下错误消息：

Traceback (most recent call last):
  File "run_scrapy.py", line 92, in <module>
    process.start()
  File "/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py", line 327, in start
    reactor.run(installSignalHandlers=False)  # blocking call
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1422, in run
    self.startRunning(installSignalHandlers=installSignalHandlers)
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1404, in startRunning
    ReactorBase.startRunning(cast(ReactorBase, self))
  File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 843, in startRunning
    raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable

回溯（最近一次呼叫最后一次）：
文件“run_scrapy.py”，第92行，在
process.start（）
文件“/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py”，第327行，开始
reactor.run（installSignalHandlers=False）#阻止调用
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”，第1422行，运行中
self.startRunning（installSignalHandlers=installSignalHandlers）
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”，第1404行，在startRunning中
反应器底座。开始耳轴加工（铸造（反应器底座，自身））
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”，第843行，在startRunning中
引发错误。ReactorNotRestartable（）
twisted.internet.error.ReactorNotRestartable

有人能帮我吗？

好的，重新启动spider没有简单的方法，但是有一种替代方法——spider永远不会关闭。为此，您可以利用

根据文件：

Sent when a spider has gone idle, which means the spider has no further:  
* requests waiting to be downloaded
* requests scheduled
* items being processed in the item pipeline

你也可以在官方网站上找到使用

信号的例子。
如果你使用Linux
，那么也许你应该使用cron
每隔几分钟启动一次。我不确定，但这可能会在短时间内启动许多爬行器，并且会产生问题。你可以使用print（）
查看中的哪些值有问题。您应该检查它在第一次运行或第二次运行时是否有问题-在上一次进程.stop（）之后再次执行进程.start（）
。可能所有问题都会导致process.stop（）
，这可能会杀死所有进程，并且无法再次启动。我遇到了相同的问题，我用下面的问题解决了它