Scrapy 刮痧+；飞溅=连接被拒绝_Scrapy_Web Crawler_Scrapy Splash_Splash Js Render

Scrapy 刮痧+；飞溅=连接被拒绝

scrapy web-crawler

Scrapy 刮痧+；飞溅=连接被拒绝,scrapy,web-crawler,scrapy-splash,splash-js-render,Scrapy,Web Crawler,Scrapy Splash,Splash Js Render,我用这个安装了Splash。按照所有步骤安装，但Splash不起作用我的设置.py文件： BOT_NAME = 'Teste' SPIDER_MODULES = ['Test.spiders'] NEWSPIDER_MODULE = 'Test.spiders' DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware':

我用这个安装了Splash。按照所有步骤安装，但Splash不起作用

我的设置.py文件：

BOT_NAME = 'Teste'
SPIDER_MODULES = ['Test.spiders']
NEWSPIDER_MODULE = 'Test.spiders'
DOWNLOADER_MIDDLEWARES = {
     'scrapy_splash.SplashCookiesMiddleware': 723,
     'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
SPLASH_URL = 'http://127.0.0.1:8050/'

当我运行

scrapy crawl TestSpider

时：

[scrapy.core.engine] INFO: Spider opened
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.google.com.br via http://127.0.0.1:8050/render.html> (failed 1 times): Connection was refused by other side: 111: Connection refused.
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://www.google.com.br via http://127.0.0.1:8050/render.html> (failed 2 times): Connection was refused by other side: 111: Connection refused.
[scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://www.google.com.br via http://127.0.0.1:8050/render.html> (failed 3 times): Connection was refused by other side: 111: Connection refused.
[scrapy.core.scraper] ERROR: Error downloading <GET http://www.google.com.br via http://127.0.0.1:8050/render.html>
Traceback (most recent call last):
     File "/home/ricardo/scrapy/lib/python3.5/site-packages/twisted/internet/defer.py", line 1126, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/ricardo/scrapy/lib/python3.5/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/ricardo/scrapy/lib/python3.5/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request 
defer.returnValue((yield 
download_func(request=request,spider=spider)))
twisted.internet.error.ConnectionRefusedError: Connection was refused 
by other side: 111: Connection refused.
[scrapy.core.engine] INFO: Closing spider (finished)
[scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 3, 'downloader/exception_type_count/twisted.internet.error.ConnectionRefusedError': 3,
'downloader/request_bytes': 1476,
'downloader/request_count': 3,
'downloader/request_method_count/POST': 3,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 6, 29, 21, 36, 16, 72916),
'log_count/DEBUG': 3,
'log_count/ERROR': 1,
'log_count/INFO': 7,
'memusage/max': 47468544,
'memusage/startup': 47468544,
'retry/count': 2,
'retry/max_reached': 1,
'retry/reason_count/twisted.internet.error.ConnectionRefusedError': 2,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'splash/render.html/request_count': 1,
'start_time': datetime.datetime(2017, 6, 29, 21, 36, 15, 851593)}
[scrapy.core.engine] INFO: Spider closed (finished)

我试着在terminal:

curl中运行这个http://localhost:8050/render.html?url=http://www.google.com/“

输出：

curl：（7）无法连接到本地主机端口8050:连接被拒绝

在调用spider之前，请确保您的splash服务器已启动并正在运行

sudo docker run-p 5023:5023-p 8050:8050-p 8051:8051 scrapinghub/splash

您需要通过命令行运行：

sudo docker run -p 8050:8050 scrapinghub/splash

和settings.py作为

SPLASH_URL = 'http://localhost:8050'

你在使用Docker吗？你用什么命令来运行Splash？你的操作系统是什么？你的Docker版本是什么？如果你不能在上面访问Splash，那么Docker可能会使用其他主机，或者你忘记公开8050端口。我没有使用Docker，但在Ubuntu 16.04中使用Venv。有必要使用Docker吗？没有必要使用Docker，但我知道这是安装Splash最简单的方法。你可以将它安装到virtualenv，但这更难。你如何启动Splash？你能粘贴确切的命令吗？你确定Splash正在运行吗？谢谢@MikhailKorobov！！！Docker使用起来更方便。

SPLASH_URL = 'http://localhost:8050'