Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 刮痕扭曲连接以非清洁方式丢失。没有代理。已尝试过的标题_Python_Web Scraping_Scrapy_Twisted - Fatal编程技术网

Python 刮痕扭曲连接以非清洁方式丢失。没有代理。已尝试过的标题

Python 刮痕扭曲连接以非清洁方式丢失。没有代理。已尝试过的标题,python,web-scraping,scrapy,twisted,Python,Web Scraping,Scrapy,Twisted,我正在尝试爬网这个网站 使用scrapy并不断收到扭曲的请求/断开连接错误。我没有使用代理,我尝试了设置用户代理和基于 下面是生成请求的代码 def start_requests(self): url = 'https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.searchJobs' headers = { 'Accept': 'text/html,appli

我正在尝试爬网这个网站

使用scrapy并不断收到扭曲的请求/断开连接错误。我没有使用代理,我尝试了设置用户代理和基于

下面是生成请求的代码

def start_requests(self):
    url = 'https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.searchJobs'

    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'en-US,en;q=0.8',
        'Connection': 'keep-alive',
        'DNT': '1',
        'Host': 'www5.apply2jobs.com',
        'Referer': 'https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.showJob&RID=2524&CurrentPage=2',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36'
    }

    yield Request(url=url, headers=headers, callback=self.parse)
这是我的回溯:

2017-08-28 13:34:13 [scrapy.core.engine] INFO: Spider opened
2017-08-28 13:34:13 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-08-28 13:34:13 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-08-28 13:34:13 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www5.apply2jobs.com/robots.txt> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-08-28 13:34:13 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www5.apply2jobs.com/robots.txt> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-08-28 13:34:13 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www5.apply2jobs.com/robots.txt> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-08-28 13:34:13 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET https://www5.apply2jobs.com/robots.txt>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-08-28 13:34:13 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.searchJobs> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-08-28 13:34:13 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.searchJobs> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-08-28 13:34:13 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.searchJobs> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-08-28 13:34:13 [scrapy.core.scraper] ERROR: Error downloading <GET https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.searchJobs>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2017-08-28 13:34:13 [scrapy.core.engine] INFO: Closing spider (finished)
2017-08-28 13:34:13[刮屑核心引擎]信息:蜘蛛打开
2017-08-28 13:34:13[scrapy.extensions.logstats]信息:爬网0页(0页/分钟),爬网0项(0项/分钟)
2017-08-28 13:34:13[scrapy.extensions.telnet]调试:telnet控制台监听127.0.0.1:6023
2017-08-28 13:34:13[scrapy.downloadermiddleware.retry]调试:重试(失败1次):[]
2017-08-28 13:34:13[scrapy.downloadermiddleware.retry]调试:重试(失败2次):[]
2017-08-28 13:34:13[scrapy.downloadermiddleware.retry]调试:放弃重试(失败3次):[]
2017-08-28 13:34:13[scrapy.downloadermiddleware.robotstxt]错误:下载错误:[]
回复已收到:[]
2017-08-28 13:34:13[scrapy.downloadermiddleware.retry]调试:重试(失败1次):[]
2017-08-28 13:34:13[scrapy.downloadermiddleware.retry]调试:重试(失败2次):[]
2017-08-28 13:34:13[scrapy.downloadermiddleware.retry]调试:放弃重试(失败3次):[]
2017-08-28 13:34:13[scrapy.core.scraper]错误:下载错误:[]
2017-08-28 13:34:13[刮屑芯发动机]信息:关闭卡盘(已完成)

多亏了对我的问题的讨论和评论,看起来最好的做法是使用带有CryptographyId的VirtualNV。您是否也在设置文件中更改了用户代理?这很可能是服务器拒绝的请求。所以可能是刮擦protection@TarunLalwani我也这样做过。还有其他想法吗?
curl-v'https://www5.apply2jobs.com/jupitermed/ProfExt/index.cfm?fuseaction=mExternal.searchJobs“
表现好吗?你能接通吗?为了更深入地挖掘,您可能需要使用像Wireshark这样的网络嗅探器(尽管这是一个HTTPS连接,人们可能只看到最初的TLS握手),所以我检查了一下,网站似乎存在一些问题。如果你做一个
curl-v”“
。最终将出现异常
*GnuTLS recv错误(-110):TLS连接未正确终止。*关闭连接0 curl:(56)GnuTLS recv错误(-110):TLS连接未正确终止。
在浏览器中也可以使用相同的方法,但我看到该站点使用旧密码,旧TLS 1.0。我建议你用这个特定的URL和scrapy讨论一个问题,看看他们是否有什么想法。但这个问题是专门针对这个网站的,它与密码或其他相关内容有关depth@paultrmbrth,这在浏览器中有效,但在scrapy或curl中无效。我所做的更新无效。包括降级至1.9级