使用Python请求'；桥'；不加载到内存中的文件？_Python_Python 2.7_Python Requests

使用Python请求'；桥'；不加载到内存中的文件？

python python-2.7

使用Python请求'；桥'；不加载到内存中的文件？,python,python-2.7,python-requests,Python,Python 2.7,Python Requests,我想使用库从url获取一个文件，并在post请求中将其用作多部分编码的文件。问题是文件可能非常大（50MB-2GB），我不想将其加载到内存中。（上下文）我在文档（、和）中列举了以下示例： with requests.get(big_file_url, stream=True) as f: requests.post(upload_url, files={'file': ('filename', f.content)}) 但我不确定我是否做对了。事实上，它抛出了这个错误

我想使用库从url获取一个文件，并在post请求中将其用作多部分编码的文件。问题是文件可能非常大（50MB-2GB），我不想将其加载到内存中。（上下文）

我在文档（、和）中列举了以下示例：

    with requests.get(big_file_url, stream=True) as f:
        requests.post(upload_url, files={'file': ('filename', f.content)})

但我不确定我是否做对了。事实上，它抛出了这个错误-根据回溯编辑：

    with requests.get(big_file_url, stream=True) as f:
    AttributeError: __exit__

有什么建议吗？

您不能将任何您喜欢的东西转换为python中的上下文管理器。它需要非常具体的属性才能成为一个。使用当前代码，您可以执行以下操作：

response = requests.get(big_file_url, stream=True)

post_response = requests.post(upload_url, files={'file': ('filename', response.iter_content())})

使用

iter\u content

将确保您的文件永远不会在内存中。将使用迭代器，否则通过使用

content

属性将文件加载到内存中
编辑合理执行此操作的唯一方法是使用，例如
如果您确实需要进行多部分/表单数据编码，那么您必须创建一个抽象层，该层将在构造函数中使用生成器，以及
response
（为
len（file）
）中的
Content Length
头，该头将具有从生成器读取的读取属性。问题是，我很确定整个东西在上传之前都会被读入内存
编辑#2

您可以自己制作一个生成器，自己生成
多部分/表单数据
编码数据。您可以用与分块编码请求相同的方式传递该请求，但必须确保设置自己的
内容类型
和
内容长度
标题。我没有时间来画一个例子，但应该不会太难。
肯尼斯·雷茨的GitHub上实际上有一个问题。我也遇到了同样的问题（尽管我只是上传了一个本地文件），我添加了一个包装器类，它是一个与请求的不同部分相对应的流列表，具有read（）属性，该属性遍历列表并读取每个部分，还获取头的必要值（边界和内容长度）：

#编码=utf-8 从未来导入unicode文字从mimetools导入选择_边界从requests.packages.urllib3.filepost import iter_字段中，获取内容类型从io导入字节io 导入编解码器 writer=编解码器。查找（'utf-8'）[3] 类MultipartPloadWrapper（对象）：定义初始化（自我，文件）： """ 初始值设定项：param文件：要上载的文件字典，格式为{'file'：（'filename'，）} ：键入network\u down\u callback：字典 """ 超级（多端口封装器，自我）。\uuuu init\uuuuu（） self.\u cursor=0 自身。\车身\零件=无 self.content\u type\u header=无 self.content\u length\u header=无自行创建请求零件（文件） def创建请求零件（自身、文件）：请求列表=[] 边界=选择_边界（）内容长度=0 边界\u字符串=b'-%s\r\n'（边界）对于fieldname，iter_字段（文件）中的值：内容长度+=长度（边界字符串）如果isinstance（值、元组）：文件名，数据=值 content\u disposition\u string=（'content-disposition:form data；name=“%s”；'filename=“%s”\r\n”“（fieldname，filename）） +（“内容类型：%s\r\n\r\n%”（获取内容类型（文件名）））其他：数据=值 content\u disposition\u string=（（'content-disposition:form data；name=“%s”\r\n”“（fieldname）） +'内容类型：文本/普通\r\n\r\n'）请求\列表.append（BytesIO（str（边界\字符串+内容\处理\字符串）））内容长度+=长度（内容处理字符串）如果hasattr（数据“读取”）：数据流=数据其他：数据流=字节（str（数据））数据流寻道（0,2） data\u size=data\u stream.tell（）数据流寻道（0）请求列表。追加（数据流）内容长度+=数据大小结束\u字符串=b'\r\n' 请求\列表.追加（字节（结束\字符串））内容长度+=长度（结束字符串）请求\u list.append（BytesIO（b'-%s--\r\n'%（边界）））内容长度+=长度（边界字符串） #httplib.py中有一个bug，它在二进制上传时生成UnicodeDecodeError，如果 #有*any*unicode字符串作为请求调用的一部分传递到头中。 #因此，此时所有字符串都显式转换为非unicode。 self.content_type_头={b'content-type'：b'multipart/form数据；边界=%s'%boundary} self.content_length_头={b'content-length'：str（content_length）} self.\u body\u parts=请求\u列表 def读取（自身，块大小=0）：剩余\u到\u读取=块大小输出_数组=[] 当剩余的_到_读取>0时：主体部分=自身。主体部分[自身光标] 当前零件=主体零件。读取（剩余零件到读取）长度读取=长度（当前工件）输出\u数组。追加（当前\u块）如果长度读数<剩余读数： #我们完成了这篇文章，但读得不够，继续下一篇剩余的\u到\u读取-=长度\u读取如果self.\u cursor==len（self.\u body\u parts）-1：打破其他：自身。_光标+=1 其他：打破返回b“”。连接（输出数组）
因此，您不需要传递“files”关键字arg，而是将此对象作为“data”属性传递给Request.Request ob
post_response = requests.post(upload_url, data=response.iter_content())

# coding=utf-8 from __future__ import unicode_literals from mimetools import choose_boundary from requests.packages.urllib3.filepost import iter_fields, get_content_type from io import BytesIO import codecs writer = codecs.lookup('utf-8')[3] class MultipartUploadWrapper(object): def __init__(self, files): """ Initializer :param files: A dictionary of files to upload, of the form {'file': ('filename', <file object>)} :type network_down_callback: Dict """ super(MultipartUploadWrapper, self).__init__() self._cursor = 0 self._body_parts = None self.content_type_header = None self.content_length_header = None self.create_request_parts(files) def create_request_parts(self, files): request_list = [] boundary = choose_boundary() content_length = 0 boundary_string = b'--%s\r\n' % (boundary) for fieldname, value in iter_fields(files): content_length += len(boundary_string) if isinstance(value, tuple): filename, data = value content_disposition_string = (('Content-Disposition: form-data; name="%s"; ''filename="%s"\r\n' % (fieldname, filename)) + ('Content-Type: %s\r\n\r\n' % (get_content_type(filename)))) else: data = value content_disposition_string = (('Content-Disposition: form-data; name="%s"\r\n' % (fieldname)) + 'Content-Type: text/plain\r\n\r\n') request_list.append(BytesIO(str(boundary_string + content_disposition_string))) content_length += len(content_disposition_string) if hasattr(data, 'read'): data_stream = data else: data_stream = BytesIO(str(data)) data_stream.seek(0,2) data_size = data_stream.tell() data_stream.seek(0) request_list.append(data_stream) content_length += data_size end_string = b'\r\n' request_list.append(BytesIO(end_string)) content_length += len(end_string) request_list.append(BytesIO(b'--%s--\r\n' % (boundary))) content_length += len(boundary_string) # There's a bug in httplib.py that generates a UnicodeDecodeError on binary uploads if # there are *any* unicode strings passed into headers as part of the requests call. # For this reason all strings are explicitly converted to non-unicode at this point. self.content_type_header = {b'Content-Type': b'multipart/form-data; boundary=%s' % boundary} self.content_length_header = {b'Content-Length': str(content_length)} self._body_parts = request_list def read(self, chunk_size=0): remaining_to_read = chunk_size output_array = [] while remaining_to_read > 0: body_part = self._body_parts[self._cursor] current_piece = body_part.read(remaining_to_read) length_read = len(current_piece) output_array.append(current_piece) if length_read < remaining_to_read: # we finished this piece but haven't read enough, moving on to the next one remaining_to_read -= length_read if self._cursor == len(self._body_parts) - 1: break else: self._cursor += 1 else: break return b''.join(output_array)

#!/usr/bin/env python import sys from urllib2 import Request, urlopen from poster.encode import multipart_encode # $ pip install poster from poster.streaminghttp import register_openers register_openers() # install openers globally def report_progress(param, current, total): sys.stderr.write("\r%03d%% of %d" % (int(1e2*current/total + .5), total)) url = 'http://example.com/path/' params = {'file': open(sys.argv[1], "rb"), 'name': 'upload test'} response = urlopen(Request(url, *multipart_encode(params, cb=report_progress))) print response.read()

import posixpath import sys from urllib import unquote from urllib2 import Request, urlopen from urlparse import urlsplit from poster.encode import MultipartParam, multipart_encode # pip install poster from poster.streaminghttp import register_openers register_openers() # install openers globally class MultipartParamNoReset(MultipartParam): def reset(self): pass # do nothing (to allow self.fileobj without seek() method) get_url = 'http://example.com/bigfile' post_url = 'http://example.com/path/' get_response = urlopen(get_url) param = MultipartParamNoReset( name='file', filename=posixpath.basename(unquote(urlsplit(get_url).path)), #XXX \ bslash filetype=get_response.headers['Content-Type'], filesize=int(get_response.headers['Content-Length']), fileobj=get_response) params = [('name', 'upload test'), param] datagen, headers = multipart_encode(params, cb=report_progress) post_response = urlopen(Request(post_url, datagen, headers)) print post_response.read()

In [1]: import requests In [2]: raw = requests.get("http://download.thinkbroadband.com/1GB.zip", stream=True).raw In [3]: raw.read(10) Out[3]: '\xff\xda\x18\x9f@\x8d\x04\xa11_' In [4]: raw.read(10) Out[4]: 'l\x15b\x8blVO\xe7\x84\xd8' In [5]: raw.read() # take forever... In [6]: raw = requests.get("http://download.thinkbroadband.com/5MB.zip", stream=True).raw In [7]: requests.post("http://www.amazon.com", {'file': ('thing.zip', raw, 'application/zip')}, stream=True) Out[7]: <Response [200]>