Python请求/URRLIB3-收到200个标头后在读取超时时重试_Python_Python Requests_Urllib3_Retry Logic

Python请求/URRLIB3-收到200个标头后在读取超时时重试

python

Python请求/URRLIB3-收到200个标头后在读取超时时重试,python,python-requests,urllib3,retry-logic,Python,Python Requests,Urllib3,Retry Logic,我使用请求下载一些大文件（100-5000 MB）。我正在使用会话和urllib3。请重试以获得自动重试。这样的重试似乎只适用于在接收HTTP头和内容开始流之前。在发送了200之后，网络dip将作为ReadTimeoutError引发请参见以下示例：导入请求、日志记录从requests.adapters导入HTTPAdapter 从urllib3导入重试 def create_会话（）：重试次数=重试次数（总数=5，退避系数=1） s=请求。会话（） s、挂载（“http://”，H

我使用请求下载一些大文件（100-5000 MB）。我正在使用会话和urllib3。请重试以获得自动重试。这样的重试似乎只适用于在接收HTTP头和内容开始流之前。在发送了200之后，网络dip将作为ReadTimeoutError引发

请参见以下示例：

导入请求、日志记录
从requests.adapters导入HTTPAdapter
从urllib3导入重试
def create_会话（）：
重试次数=重试次数（总数=5，退避系数=1）
s=请求。会话（）
s、 挂载（“http://”，HTTPAdapter（最大重试次数=重试次数））
s、 挂载（“https://”，HTTPAdapter（最大重试次数=重试次数））
返回s
logging.basicConfig（level=logging.DEBUG，stream=sys.stderr）
会话=创建会话（）
response=session.get（url，timeout=（120,10））#故意短读超时

这将提供以下日志输出：

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): example:443
DEBUG:urllib3.connectionpool:https://example:443 "GET /example.zip HTTP/1.1" 200 1568141974

< UNPLUG NETWORK CABLE FOR 10-15 sec HERE > 

Traceback (most recent call last):
  File "urllib3/response.py", line 438, in _error_catcher
    yield
  File "urllib3/response.py", line 519, in read
    data = self._fp.read(amt) if not fp_closed else b""
  File "/usr/lib/python3.8/http/client.py", line 458, in read
    n = self.readinto(b)
  File "/usr/lib/python3.8/http/client.py", line 502, in readinto
    n = self.fp.readinto(b)
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "requests/models.py", line 753, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "urllib3/response.py", line 576, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "urllib3/response.py", line 541, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "urllib3/response.py", line 443, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='example', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "example.py", line 14, in _download
    response = session.get(url, headers=headers, timeout=300)
  File "requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "requests/sessions.py", line 697, in send
    r.content
  File "requests/models.py", line 831, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "requests/models.py", line 760, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='example', port=443): Read timed out.

DEBUG:urllib3.connectionpool:启动新的HTTPS连接（1）：示例：443
调试：urllib3.connectionpool:https://example:443 “GET/example.zip HTTP/1.1”200 1568141974
<此处拔下网络电缆10-15秒>
回溯（最近一次呼叫最后一次）：
文件“urllib3/response.py”，第438行，在_error_catcher中
产量
文件“urllib3/response.py”，第519行，已读
数据=自身。如果不是fp，则fp.read（金额）否则b“”
文件“/usr/lib/python3.8/http/client.py”，第458行，已读
n=自读入（b）
readinto中第502行的文件“/usr/lib/python3.8/http/client.py”
n=自fp读入（b）
readinto中的文件“/usr/lib/python3.8/socket.py”，第669行
返回自我。将袜子重新放入（b）
文件“/usr/lib/python3.8/ssl.py”，第1241行，在recv_中
返回自读（N字节，缓冲区）
文件“/usr/lib/python3.8/ssl.py”，第1099行，已读
返回self.\u sslobj.read（len，buffer）
socket.timeout:读取操作超时
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“requests/models.py”，第753行，在generate中
对于self.raw.stream中的块（块大小，解码内容=True）：
文件“urllib3/response.py”，第576行，在流中
数据=自读（金额=金额，解码内容=解码内容）
文件“urllib3/response.py”，第541行，已读
提升未完成读取（self.\u fp\u字节\u读取，self.length\u剩余）
文件“/usr/lib/python3.8/contextlib.py”，第131行，在__
self.gen.throw（类型、值、回溯）
文件“urllib3/response.py”，第443行，在_error_catcher中
引发ReadTimeoutError（self.\u池，无，“读取超时”）
urllib3.exceptions.ReadTimeoutError:HTTPSConnectionPool（host='example'，port=443）：读取超时。
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
下载文件“example.py”，第14行
response=session.get（url，headers=headers，timeout=300）
get中第555行的文件“requests/sessions.py”
返回self.request（'GET'，url，**kwargs）
请求中第542行的文件“requests/sessions.py”
resp=自我发送（准备，**发送）
文件“requests/sessions.py”，第697行，在send中
r、 内容
内容中第831行的文件“requests/models.py”
self.\u content=b“”。加入（self.iter\u content（content\u CHUNK\u SIZE））或b“”
文件“requests/models.py”，第760行，在generate中
升起连接器错误（e）
requests.exceptions.ConnectionError:HTTPSConnectionPool（host='example'，port=443）：读取超时。

我可以理解为什么这不起作用，当您将

stream=True

参数与

response.iter\u content（）

一起添加时，事情变得更加明显。我假设其基本原理是read_timeout和TCP层应该处理这个问题（在我的示例中，我故意将read_timeout设置为较低的值以引发它）。但是，我们有服务器重启/崩溃或防火墙在流中间丢弃连接的情况，客户端的唯一选择是重试整个事件。这个问题有没有简单的解决方案，最好是内置在请求中？人们总是可以用坚韧或手动重试来包装整个过程，但理想情况下我希望避免这种情况，因为这意味着添加另一层，并且需要从其他实际错误中识别网络错误等。

如果您下载的文件来自支持电子标签和范围请求的服务器，您可以跟踪内容长度和下载量。如果断开连接，可以尝试在上游服务器再次可用时完成文件下载，方法是使用etag确保文件未更改，并使用范围标头下载丢失的字节，而无需从头开始。