Python 连接被另一方拒绝:111:连接被拒绝

Python 连接被另一方拒绝:111:连接被拒绝,python,scrapy,scrapinghub,Python,Scrapy,Scrapinghub,我为LinkedIn准备了一只蜘蛛。它在我的本地机器上运行良好,但在Scrapinghub上部署时出现错误: Error downloading <GET https://www.linkedin.com/>: Connection was refused by other side: 111: Connection refused. 下载时出错:连接被另一方拒绝:111:连接被拒绝。 Scrapinghub的完整日志为: 0: 2018-08-30 12:58:34 INFO

我为LinkedIn准备了一只蜘蛛。它在我的本地机器上运行良好,但在Scrapinghub上部署时出现错误:

Error downloading <GET https://www.linkedin.com/>: Connection was refused by other side: 111: Connection refused.
下载时出错:连接被另一方拒绝:111:连接被拒绝。 Scrapinghub的完整日志为:

0:  2018-08-30 12:58:34 INFO    Log opened.
1:  2018-08-30 12:58:34 INFO    [scrapy.log] Scrapy 1.0.5 started
2:  2018-08-30 12:58:34 INFO    [scrapy.utils.log] Scrapy 1.0.5 started (bot: facebook_stats)
3:  2018-08-30 12:58:34 INFO    [scrapy.utils.log] Optional features available: ssl, http11, boto
4:  2018-08-30 12:58:34 INFO    [scrapy.utils.log] Overridden settings: {'NEWSPIDER_MODULE': 'facebook_stats.spiders', 'STATS_CLASS': 'sh_scrapy.stats.HubStorageStatsCollector', 'LOG_LEVEL': 'INFO', 'SPIDER_MODULES': ['facebook_stats.spiders'], 'RETRY_TIMES': 10, 'RETRY_HTTP_CODES': [500, 503, 504, 400, 403, 404, 408], 'BOT_NAME': 'facebook_stats', 'MEMUSAGE_LIMIT_MB': 950, 'DOWNLOAD_DELAY': 1, 'TELNETCONSOLE_HOST': '0.0.0.0', 'LOG_FILE': 'scrapy.log', 'MEMUSAGE_ENABLED': True, 'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.7'}
5:  2018-08-30 12:58:34 INFO    [scrapy.log] HubStorage: writing items to https://storage.scrapinghub.com/items/341545/3/9
6:  2018-08-30 12:58:34 INFO    [scrapy.middleware] Enabled extensions: CoreStats, TelnetConsole, MemoryUsage, LogStats, StackTraceDump, CloseSpider, SpiderState, AutoThrottle, HubstorageExtension
7:  2018-08-30 12:58:35 INFO    [scrapy.middleware] Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
8:  2018-08-30 12:58:35 INFO    [scrapy.middleware] Enabled spider middlewares: HubstorageMiddleware, HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
9:  2018-08-30 12:58:35 INFO    [scrapy.middleware] Enabled item pipelines: CreditCardsPipeline
10: 2018-08-30 12:58:35 INFO    [scrapy.core.engine] Spider opened
11: 2018-08-30 12:58:36 INFO    [scrapy.extensions.logstats] Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
12: 2018-08-30 12:58:36 INFO    TelnetConsole starting on 6023
13: 2018-08-30 12:59:32 ERROR   [scrapy.core.scraper] Error downloading <GET https://www.linkedin.com/>: Connection was refused by other side: 111: Connection refused.
14: 2018-08-30 12:59:32 INFO    [scrapy.core.engine] Closing spider (finished)
15: 2018-08-30 12:59:33 INFO    [scrapy.statscollectors] Dumping Scrapy stats: More
16: 2018-08-30 12:59:34 INFO    [scrapy.core.engine] Spider closed (finished)
17: 2018-08-30 12:59:34 INFO    Main loop terminated.
0:2018-08-30 12:58:34信息日志已打开。
1:2018-08-30 12:58:34信息[scrapy.log]scrapy 1.0.5启动
2:2018-08-30 12:58:34信息[scrapy.utils.log]scrapy 1.0.5已启动(bot:facebook_stats)
3:2018-08-30 12:58:34信息[scrapy.utils.log]可选功能:ssl、http11、boto
4:2018-08-30 12:58:34信息[scrapy.utils.log]覆盖的设置:{'NEWSPIDER_MODULE':'facebook_stats.SPIDER','stats_CLASS':'sh_scrapy.stats.HubStorageStatsCollector','log_LEVEL':'INFO','SPIDER_MODULES':['facebook_stats.SPIDER','RETRY_TIMES':10','RETRY_HTTP_code':[500,503 504,504,400,403,404,408],'BOT_NAME':'facebook_stats','MEMUSAGE_LIMIT_MB':950,'DOWNLOAD_DELAY':1,'TELNETCONSOLE_HOST':'0.0.0.0','LOG_FILE':'scrapy.LOG','MEMUSAGE_ENABLED':True,'USER_AGENT':'Mozilla/5.0(X11;Linux x86_64;rv:7.0.1)Gecko/20100101 Firefox/7.7'}
5:2018-08-30 12:58:34信息[scrapy.log]HUB存储:将项目写入https://storage.scrapinghub.com/items/341545/3/9
6:2018-08-30 12:58:34信息[scrapy.中间件]支持的扩展:CoreStats、TelnetConsole、MemoryUsage、LogStats、StackTraceUp、CloseSpider、SpiderState、AutoThrottle、HubstorageExtension
7:2018-08-30 12:58:35信息[scrapy.middleware]启用的下载中间件:HttpAuthMiddleware、DownloadTimeoutMiddleware、UserAgentMiddleware、RetryMiddleware、DefaultHeadersMiddleware、MetaRefreshMiddleware、HttpCompressionMiddleware、RedirectMiddleware、Cookies middleware、ChunkedTransferMiddleware、DownloadersStats
8:2018-08-30 12:58:35信息[scrapy.middleware]启用的蜘蛛中间件:HubstorageMiddleware、HttpErrorMiddleware、OffsiteMiddleware、RefererMiddleware、UrlLengthMiddleware、DepthMiddleware
9:2018-08-30 12:58:35信息[scrapy.middleware]启用的项目管道:CreditCardsPipeline
10:2018-08-30 12:58:35信息[scrapy.core.engine]十字轴打开
11:2018-08-30 12:58:36信息[scrapy.extensions.logstats]抓取了0页(以0页/分钟的速度),抓取了0项(以0项/分钟的速度)
12:2018-08-30 12:58:36信息TelnetConsole从6023开始
13:2018-08-30 12:59:32错误[scrapy.core.scraper]下载错误:连接被对方拒绝:111:连接被拒绝。
14:2018-08-30 12:59:32信息[scrapy.core.engine]闭合卡盘(已完成)
15:2018-08-30 12:59:33信息[scrapy.statcollector]倾销scrapy统计数据:更多
16:2018-08-30 12:59:34信息[scrapy.core.engine]十字轴关闭(完成)
17:2018-08-30 12:59:34信息主回路终止。
如何解决此问题?

LinkedIn:

禁止使用的软件和扩展 LinkedIn致力于保护其成员的数据安全,并确保其网站免受欺诈和滥用。为了保护我们会员的数据和我们的网站,我们不允许使用任何第三方软件,包括“爬虫”、机器人、浏览器插件或浏览器扩展(也称为“附加组件”),这些软件会刮取、修改LinkedIn网站的外观或自动执行LinkedIn网站上的活动。这些工具违反了,包括但不限于,第8.2节中列出的许多“不允许”


有理由认为,他们可能会主动阻止来自Scrapinghub和类似服务的连接。

因此,没有办法在Scrapinghub???@Omariaz上进一步删除页面,考虑到这违反了LinkedIn用户协议,我强烈建议不要在Scrapinghub或其他任何地方这样做。如果你决定无论如何都要尝试,你将面临像这样的技术挑战。无论你做什么,你都不可能从Scrapinghub中刮走LinkedIn。