Python 2.7 尝试使用urllib2获取SSLv3站点时引发httplib.BadStatusLine异常

Python 2.7 尝试使用urllib2获取SSLv3站点时引发httplib.BadStatusLine异常,python-2.7,urllib2,sslv3,Python 2.7,Urllib2,Sslv3,我试图读入BeautifulSoup,到目前为止,每次尝试都失败了,因为我试图打开一个安全连接(我最初试图用Python 3实现这一点,但正如您所看到的,这也充满了危险)。下面是我最近一次涉及urllib2的尝试(我还没有找到urllib3示例,也没有成功地将此代码更新为urllib3): 运行此代码时,我得到以下堆栈跟踪: Traceback (most recent call last): File "test.py", line 27, in <module> r

我试图读入BeautifulSoup,到目前为止,每次尝试都失败了,因为我试图打开一个安全连接(我最初试图用Python 3实现这一点,但正如您所看到的,这也充满了危险)。下面是我最近一次涉及urllib2的尝试(我还没有找到urllib3示例,也没有成功地将此代码更新为urllib3):

运行此代码时,我得到以下堆栈跟踪:

Traceback (most recent call last):
  File "test.py", line 27, in <module>
    r = urllib2.urlopen('https://bw6.clpccd.cc.ca.us/clpccd/2014/02/sched_l.htm')
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "test.py", line 22, in https_open
    return self.do_open(HTTPSConnectionV3, req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1187, in do_open
    r = h.getresponse(buffering=True)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1045, in getresponse
    response.begin()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 409, in begin
    version, status, reason = self._read_status()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 373, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 331, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 516, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 333, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1172, in getresponse
    response.begin()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 321, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/adapters.py", line 362, in send
    timeout=timeout
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    _pool=self, _stacktrace=stacktrace)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/util/retry.py", line 245, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/packages/six.py", line 309, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 516, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 333, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1172, in getresponse
    response.begin()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 321, in _read_status
    raise BadStatusLine(line)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine("''",))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./test.py", line 23, in <module>
    r = s.get('https://bw6.clpccd.cc.ca.us/clpccd/2014/02/sched_l.htm', headers=headers)
  File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 469, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 457, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 569, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/adapters.py", line 407, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
如果我强制SSLv3,我会得到预期的输出:

$ curl -v -3 https://bw6.clpccd.cc.ca.us/clpccd/2014/02/sched_l.htm
* Adding handle: conn: 0x7ffa50804000
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7ffa50804000) send_pipe: 1, recv_pipe: 0
* About to connect() to bw6.clpccd.cc.ca.us port 443 (#0)
*   Trying 205.155.225.145...
* Connected to bw6.clpccd.cc.ca.us (205.155.225.145) port 443 (#0)
* SSL 3.0 connection using SSL_RSA_WITH_RC4_128_SHA
* Server certificate: bw6.clpccd.cc.ca.us
* Server certificate: VeriSign Class 3 Secure Server CA - G3
* Server certificate: VeriSign Class 3 Public Primary Certification Authority - G5
* Server certificate: Class 3 Public Primary Certification Authority
> GET /clpccd/2014/02/sched_l.htm HTTP/1.1
> User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
> Host: bw6.clpccd.cc.ca.us
> Accept: */*
> Referer: 
> 
< HTTP/1.1 200 OK
< Date: Thu, 23 Oct 2014 00:00:11 GMT
* Server Oracle-Application-Server-10g/10.1.3.4.0 Oracle-HTTP-Server is not blacklisted
< Server: Oracle-Application-Server-10g/10.1.3.4.0 Oracle-HTTP-Server
< Last-Modified: Wed, 22 Oct 2014 20:05:42 GMT
< ETag: "422e-1e72-54480e16"
< Accept-Ranges: bytes
< Content-Length: 7794
< Connection: close
< Content-Type: text/html
< 
<html....>

* Closing connection 0
这产生了一个类似(但更神秘)的堆栈跟踪:

Traceback (most recent call last):
  File "test.py", line 27, in <module>
    r = urllib2.urlopen('https://bw6.clpccd.cc.ca.us/clpccd/2014/02/sched_l.htm')
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "test.py", line 22, in https_open
    return self.do_open(HTTPSConnectionV3, req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1187, in do_open
    r = h.getresponse(buffering=True)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1045, in getresponse
    response.begin()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 409, in begin
    version, status, reason = self._read_status()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 373, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 331, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 516, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 333, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1172, in getresponse
    response.begin()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 321, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/requests/adapters.py", line 362, in send
    timeout=timeout
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    _pool=self, _stacktrace=stacktrace)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/util/retry.py", line 245, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/packages/six.py", line 309, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 516, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 333, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1172, in getresponse
    response.begin()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 321, in _read_status
    raise BadStatusLine(line)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine("''",))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./test.py", line 23, in <module>
    r = s.get('https://bw6.clpccd.cc.ca.us/clpccd/2014/02/sched_l.htm', headers=headers)
  File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 469, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 457, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 569, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/requests/adapters.py", line 407, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
回溯(最近一次呼叫最后一次):
文件“/usr/local/lib/python3.4/site packages/requests/packages/urllib3/connectionpool.py”,第331行,在请求中
httplib_response=conn.getresponse(缓冲=True)
TypeError:getresponse()获得意外的关键字参数“buffering”
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
urlopen中的文件“/usr/local/lib/python3.4/site packages/requests/packages/urllib3/connectionpool.py”,第516行
正文=正文,标题=标题)
文件“/usr/local/lib/python3.4/site packages/requests/packages/urllib3/connectionpool.py”,第333行,在请求中
httplib_response=conn.getresponse()
getresponse中的第1172行文件“/usr/local/ceral/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py”
response.begin()
文件“/usr/local/ceral/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py”,第351行,开始
版本、状态、原因=self.\u读取\u状态()
文件“/usr/local/ceral/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py”,第321行,处于“读取”状态
升起状态行(行)
http.client.BadStatusLine:'
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/usr/local/lib/python3.4/site packages/requests/adapters.py”,第362行,在send中
超时=超时
文件“/usr/local/lib/python3.4/site packages/requests/packages/urllib3/connectionpool.py”,第559行,在urlopen中
_池=self,_stacktrace=stacktrace)
文件“/usr/local/lib/python3.4/site packages/requests/packages/urllib3/util/retry.py”,第245行,增量
升起六个。重新升起(类型(错误),错误,_stacktrace)
文件“/usr/local/lib/python3.4/site-packages/requests/packages/urllib3/packages/six.py”,第309行,重新登录
通过_回溯(tb)提升值
urlopen中的文件“/usr/local/lib/python3.4/site packages/requests/packages/urllib3/connectionpool.py”,第516行
正文=正文,标题=标题)
文件“/usr/local/lib/python3.4/site packages/requests/packages/urllib3/connectionpool.py”,第333行,在请求中
httplib_response=conn.getresponse()
getresponse中的第1172行文件“/usr/local/ceral/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py”
response.begin()
文件“/usr/local/ceral/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py”,第351行,开始
版本、状态、原因=self.\u读取\u状态()
文件“/usr/local/ceral/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py”,第321行,处于“读取”状态
升起状态行(行)
requests.packages.urllib3.exceptions.ProtocolError:(“连接已中止”,BadStatusLine(“,”))
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“/test.py”,第23行,在
r=s.get('https://bw6.clpccd.cc.ca.us/clpccd/2014/02/sched_l.htm,headers=headers)
文件“/usr/local/lib/python3.4/site-packages/requests/sessions.py”,get中第469行
返回self.request('GET',url,**kwargs)
文件“/usr/local/lib/python3.4/site packages/requests/sessions.py”,请求中的第457行
resp=自我发送(准备,**发送)
文件“/usr/local/lib/python3.4/site packages/requests/sessions.py”,第569行,在send中
r=适配器.send(请求,**kwargs)
文件“/usr/local/lib/python3.4/site packages/requests/adapters.py”,第407行,在send中
raise CONNECTIONERR(错误,请求=请求)
requests.exceptions.ConnectionError:(“连接已中止”,BadStatusLine(“,”))

为了取得进展,我写了这段代码,尽管我不满意这是一个python式的答案:

import os
import subprocess

from bs4 import BeautifulSoup
FNULL = open(os.devnull, 'w')

html = subprocess.Popen(["curl", "-3", "https://bw6.clpccd.cc.ca.us/clpccd/2014/02/sched_l.htm", stdout=subprocess.PIPE, stderr=FNULL).communicate()[0]

soup = BeautifulSoup(html)

for t in soup.findAll('h2'):
    print(t.text)
基本上,我只是调用
curl-3
并捕获输出

import os
import subprocess

from bs4 import BeautifulSoup
FNULL = open(os.devnull, 'w')

html = subprocess.Popen(["curl", "-3", "https://bw6.clpccd.cc.ca.us/clpccd/2014/02/sched_l.htm", stdout=subprocess.PIPE, stderr=FNULL).communicate()[0]

soup = BeautifulSoup(html)

for t in soup.findAll('h2'):
    print(t.text)