Python 在io.TextIOWrapper中包装urllib3.HTTPResponse_Python_Python 3.x_Urllib3

Python 在io.TextIOWrapper中包装urllib3.HTTPResponse

python python-3.x

Python 在io.TextIOWrapper中包装urllib3.HTTPResponse,python,python-3.x,urllib3,Python,Python 3.x,Urllib3,我使用AWSboto3库，它返回我一个urllib3.response.HTTPResponse的实例。该响应是io.IOBase的子类，因此表现为二进制文件。它的read（）方法返回bytes实例现在，我需要解码以这种方式接收的文件中的csv数据。我希望我的代码能够以最小的代码开销在py2和py3上工作，因此我使用backports.csv，它依赖于io.IOBase对象作为输入，而不是py2的file（）对象第一个问题是HTTPResponse为CSV文件生成bytes数据，我有CSV.

我使用AWS

boto3

库，它返回我一个

urllib3.response.HTTPResponse

的实例。该响应是

io.IOBase

的子类，因此表现为二进制文件。它的

read（）

方法返回

bytes

实例

现在，我需要解码以这种方式接收的文件中的

csv

数据。我希望我的代码能够以最小的代码开销在

py2

和

py3

上工作，因此我使用

backports.csv

，它依赖于

io.IOBase

对象作为输入，而不是py2的

file（）

对象

第一个问题是

HTTPResponse

为CSV文件生成

bytes

数据，我有

CSV.reader

，它需要

str

数据

>>> import io
>>> from backports import csv  # actually try..catch statement here
>>> from mymodule import get_file

>>> f = get_file()  # returns instance of urllib3.HTTPResponse
>>> r = csv.reader(f)
>>> list(r)
Error: iterator should return strings, not bytes (did you open the file in text mode?)

我试图用

io.TextIOWrapper

包装

HTTPResponse

，但出现错误

“HTTPResponse”对象没有属性“read1”

。这是因为

TextIOWrapper

用于

bufferedobase

对象，而不是

IOBase

对象。而且它只发生在

python2

的

TextIOWrapper

实现上，因为它总是希望底层对象具有

read1

（），而

python3

的实现检查

read1

是否存在，然后优雅地返回到

read

）

然后我尝试用

io.BufferedReader

包装

HTTPResponse

，然后用

io.TextIOWrapper

包装。我得到了以下错误：

>>> f = get_file()
>>> br = io.BufferedReader(f)
>>> tw = io.TextIOWrapper(br)
>>> list(csv.reader(f))
ValueError: I/O operation on closed file.

经过一些调查，结果表明，只有当文件没有以

\n

结尾时，才会发生错误。如果以

\n

结束，则问题不会发生，一切正常

在

HTTPResponse

（）中有一些额外的逻辑用于关闭底层对象，这似乎是导致问题的原因

问题是：如何将代码写入

在python2和python3上工作，最好不使用try..catch或依赖于版本的分支
正确处理表示为
```
HTTPResponse
```
的CSV文件，无论它们是否以
```
\n
```
结尾

一种可能的解决方案是在

TextIOWrapper

周围制作一个自定义包装，当对象关闭时，它将使

read（）

b'

，而不是引发

ValueError

。但是，如果没有这样的攻击，还有更好的解决方案吗？

看起来这是

urllib3.HTTPResponse

和

文件

对象之间的接口不匹配。如中所述

目前还没有修复程序，因此我使用了下面的包装器代码，看起来效果很好：

class ResponseWrapper(io.IOBase):
    """
    This is the wrapper around urllib3.HTTPResponse
    to work-around an issue shazow/urllib3#1305.

    Here we decouple HTTPResponse's "closed" status from ours.
    """
    # FIXME drop this wrapper after shazow/urllib3#1305 is fixed

    def __init__(self, resp):
        self._resp = resp

    def close(self):
        self._resp.close()
        super(ResponseWrapper, self).close()

    def readable(self):
        return True

    def read(self, amt=None):
        if self._resp.closed:
            return b''
        return self._resp.read(amt)

    def readinto(self, b):
        val = self.read(len(b))
        if not val:
            return 0
        b[:len(val)] = val
        return len(val)

并按如下方式使用：

>>> f = get_file()
>>> r = csv.reader(ResponseWrapper(io.TextIOWrapper(io.BufferedReader(f))))
>>> list(r)

类似的修复是由

urllib3

维护人员在bug报告评论中提出的，但这将是一个突破性的改变，因此目前情况可能不会改变，所以我必须使用wrapper（或者做一些可能更糟的猴子补丁）

>>> f = get_file()
>>> r = csv.reader(ResponseWrapper(io.TextIOWrapper(io.BufferedReader(f))))
>>> list(r)