Python 刮痧';twisted.internet.error.ReactorNotRestartable';第一次运行后出错
我正在使用Python 刮痧';twisted.internet.error.ReactorNotRestartable';第一次运行后出错,python,python-3.x,scrapy,twisted,scrapy-spider,Python,Python 3.x,Scrapy,Twisted,Scrapy Spider,我正在使用CrawlerProcess从脚本运行Scrapy(1.4.0版)。URL来自用户输入。第一次运行良好,但第二次出现扭曲的.internet.error.ReactorNotRestartable错误。所以,程序卡在那里了 爬虫进程部分: 以下是第一次运行的输出: 如何在每个过程完成后重启反应器或停止反应器? 在堆栈溢出中也有一些类似的问题,但对于Scrapy的旧版本,有一些解决方案。无法使用这些解决方案。您可以添加此行 process.start(爬网后停止=False) 希望你的问
CrawlerProcess
从脚本运行Scrapy(1.4.0版)。URL来自用户输入。第一次运行良好,但第二次出现扭曲的.internet.error.ReactorNotRestartable错误。所以,程序卡在那里了
爬虫进程部分:
以下是第一次运行的输出:
如何在每个过程完成后重启反应器或停止反应器?
在堆栈溢出中也有一些类似的问题,但对于Scrapy的旧版本,有一些解决方案。无法使用这些解决方案。您可以添加此行
process.start(爬网后停止=False)
希望你的问题能得到解决
谢谢尝试在单独的过程中启动您的功能:
from multiprocessing.context import Process
def crawl():
crawler = CrawlerProcess(settings)
crawler.crawl(MySpider)
crawler.start()
process = Process(target=crawl)
process.start()
process.join()
试过了。但是它卡在那里了。进程未停止并保持运行。的可能重复
~~~~~~~~~~~~ Processing is going to be started ~~~~~~~~~~
2017-07-17 05:58:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.some-url.com/content.php> (referer: None)
2017-07-17 05:58:46 [scrapy.core.scraper] ERROR: Spider must return Request, BaseItem, dict or None, got 'HtmlResponse' in <GET http://www.some-url.com/content.php>
2017-07-17 05:58:46 [scrapy.core.engine] INFO: Closing spider (finished)
2017-07-17 05:58:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 261,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 14223,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 7, 17, 5, 58, 46, 760661),
'log_count/DEBUG': 2,
'log_count/ERROR': 1,
'log_count/INFO': 7,
'memusage/max': 49983488,
'memusage/startup': 49983488,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2017, 7, 17, 5, 58, 45, 162155)}
2017-07-17 05:58:46 [scrapy.core.engine] INFO: Spider closed (finished)
~~~~~~~~~~~~ Processing ended ~~~~~~~~~~
~~~~~~~~~~~~ Processing is going to be started ~~~~~~~~~~
[2017-07-17 06:03:18,075] ERROR in app: Exception on /scripts/1/process [GET]
Traceback (most recent call last):
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "api.py", line 13, in process_crawler
processor.process()
File "/var/www/python/crawlerapp/application/scripts/general_spider.py", line 124, in process
process.start()
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/scrapy/crawler.py", line 285, in start
reactor.run(installSignalHandlers=False) # blocking call
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/twisted/internet/base.py", line 1242, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/twisted/internet/base.py", line 1222, in startRunning
ReactorBase.startRunning(self)
File "/var/www/python/crawlerapp/appenv/lib/python3.5/site-packages/twisted/internet/base.py", line 730, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
from multiprocessing.context import Process
def crawl():
crawler = CrawlerProcess(settings)
crawler.crawl(MySpider)
crawler.start()
process = Process(target=crawl)
process.start()
process.join()