Python 如何上传长度超过2147483647字节的字符串块？_Python

Python 如何上传长度超过2147483647字节的字符串块？

python

Python 如何上传长度超过2147483647字节的字符串块？,python,Python,我正试图上传一个大约5GB大小的文件，如下所示，但它抛出的错误字符串长度超过2147483647字节。这听起来像是有2 GB的上传限制。有没有办法将数据分块上传？有人能提供指导吗 logger.debug(attachment_path) currdir = os.path.abspath(os.getcwd()) os.chdir(os.path.dirname(attachment_path)) headers = self._headers headers['Content-Type']

我正试图上传一个大约5GB大小的文件，如下所示，但它抛出的错误

字符串长度超过2147483647字节。这听起来像是有2 GB的上传限制。有没有办法将数据分块上传？有人能提供指导吗
logger.debug(attachment_path)
currdir = os.path.abspath(os.getcwd())
os.chdir(os.path.dirname(attachment_path))
headers = self._headers
headers['Content-Type'] = content_type
headers['X-Override-File'] = 'true'
if not os.path.exists(attachment_path):
    raise Exception, "File path was invalid, no file found at the path %s" % attachment_path
filesize = os.path.getsize(attachment_path) 
fileToUpload = open(attachment_path, 'rb').read()
logger.info(filesize)
logger.debug(headers)
r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)), 
                 headers=headers,data=fileToUpload,timeout=300)

错误：
string longer than 2147483647 bytes

更新：
def read_in_chunks(file_object,chunk_size=30720*30720):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data
        f = open(attachment_path)

for piece in read_in_chunks(f):
      r = requests.put(self._baseurl + 'problems/' + problemID + "/" + attachment_type + "/" + urllib.quote(os.path.basename(attachment_path)), 
                        headers=headers,data=piece,timeout=300)

你的问题被问到了；他们的建议是使用。如果这不起作用，你可以看看a是否起作用
[编辑]
基于原始代码的示例：
# Using `with` here will handle closing the file implicitly
with open(attachment_path, 'rb') as file_to_upload:
    r = requests.put(
        "{base}problems/{pid}/{atype}/{path}".format(
            base=self._baseurl,
            # It's better to use consistent naming; search PEP-8 for standard Python conventions.
            pid=problem_id,
            atype=attachment_type,
            path=urllib.quote(os.path.basename(attachment_path)),
        ),
        headers=headers,
        # Note that you're passing the file object, NOT the contents of the file:
        data=file_to_upload,
        # Hard to say whether this is a good idea with a large file upload
        timeout=300,
    )

我不能保证它会按原样运行，因为我无法实际测试它，但它应该很接近。我链接到的bug跟踪器注释也提到了这一点，因此如果您指定的头是实际需要的，那么这可能不起作用
关于区块编码：这应该是您的第二选择。您的代码没有指定'rb'
作为打开（…）
的模式，因此更改该模式可能会使上述代码正常工作。如果没有，你可以试试这个
def read_in_chunks():
    # If you're going to chunk anyway, doesn't it seem like smaller ones than this would be a good idea?
    chunk_size = 30720 * 30720

    # I don't know how correct this is; if it doesn't work as expected, you'll need to debug
    with open(attachment_path, 'rb') as file_object:
        while True:
            data = file_object.read(chunk_size)
            if not data:
                break
            yield data


# Same request as above, just using the function to chunk explicitly; see the `data` param
r = requests.put(
    "{base}problems/{pid}/{atype}/{path}".format(
        base=self._baseurl,
        pid=problem_id,
        atype=attachment_type,
        path=urllib.quote(os.path.basename(attachment_path)),
    ),
    headers=headers,
    # Call the chunk function here and the request will be chunked as you specify
    data=read_in_chunks(),
    timeout=300,
)

我接受了你的建议并尝试了这个方法，但不知何故它挂起了，我更新了我的代码，可能出了什么问题？不清楚你是否理解建议的解决方案。您不应该迭代块并为它们创建单独的请求；您应该将生成器作为数据
参数传递给单个请求。我将根据您的原始代码和我提供的链接使用一个示例进行更新。我尝试了第二个选项，但它确实上传了，但抛出了错误“（“连接中止”）、错误（32，“断管”）`不，这个问题是关于获取“字符串太大”错误。这似乎已经解决了。你所说的似乎是你得到的不是JSON的回应；这是一个单独的问题，我不能在评论中提及。如果你接受这个答案并提出一个新问题，我会很高兴地看一看，我相信其他人也会看一看。但请留下这一条作为其他可能有同样问题的人的参考；这是StackOverflow值的很大一部分。@user2125827您的新错误基本上表明服务器端出了问题，这个答案很好地解决了最初的客户端错误。