Python 带爬虫过程的刮擦无限循环
我目前正在运行ScrapyV2.5,我想运行无限循环。我的代码:Python 带爬虫过程的刮擦无限循环,python,recursion,scrapy,infinite,Python,Recursion,Scrapy,Infinite,我目前正在运行ScrapyV2.5,我想运行无限循环。我的代码: class main(): def bucle(self, array_spyder, process): mongo = mongodb(setting) for spider_name in array_spider: process_init.crawl(spider_name, params={ "mongo": mongo,
class main():
def bucle(self, array_spyder, process):
mongo = mongodb(setting)
for spider_name in array_spider:
process_init.crawl(spider_name, params={ "mongo": mongo, "spider_name": spider_name})
process.start()
process.stop()
mongo.close_mongo()
if __name__ == "__main__":
setting = get_project_settings()
while True:
process = CrawlerProcess(setting)
array_spider = process.spider_loader.list()
class_main = main()
class_main.bucle(array_spider, process)
但这导致了如下错误消息:
Traceback (most recent call last):
File "run_scrapy.py", line 92, in <module>
process.start()
File "/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py", line 327, in start
reactor.run(installSignalHandlers=False) # blocking call
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1422, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 1404, in startRunning
ReactorBase.startRunning(cast(ReactorBase, self))
File "/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py", line 843, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
回溯(最近一次呼叫最后一次):
文件“run_scrapy.py”,第92行,在
process.start()
文件“/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py”,第327行,开始
reactor.run(installSignalHandlers=False)#阻止调用
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”,第1422行,运行中
self.startRunning(installSignalHandlers=installSignalHandlers)
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”,第1404行,在startRunning中
反应器底座。开始耳轴加工(铸造(反应器底座,自身))
文件“/usr/local/lib/python3.8/dist-packages/twisted/internet/base.py”,第843行,在startRunning中
引发错误。ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
有人能帮我吗?好的,重新启动spider没有简单的方法,但是有一种替代方法——spider永远不会关闭。为此,您可以利用 根据文件:
Sent when a spider has gone idle, which means the spider has no further:
* requests waiting to be downloaded
* requests scheduled
* items being processed in the item pipeline
你也可以在官方网站上找到使用
信号的例子。如果你使用Linux
,那么也许你应该使用cron
每隔几分钟启动一次。我不确定,但这可能会在短时间内启动许多爬行器,并且会产生问题。你可以使用print()
查看中的哪些值有问题。您应该检查它在第一次运行或第二次运行时是否有问题-在上一次进程.stop()之后再次执行进程.start()
。可能所有问题都会导致process.stop()
,这可能会杀死所有进程,并且无法再次启动。我遇到了相同的问题,我用下面的问题解决了它