Python 具有超时、最大大小和连接池的http请求_Python_Http_Timeout_Connection Pooling_Max Size

Python 具有超时、最大大小和连接池的http请求

python http

Python 具有超时、最大大小和连接池的http请求,python,http,timeout,connection-pooling,max-size,Python,Http,Timeout,Connection Pooling,Max Size,我正在Python（2.7）中寻找一种方法来处理HTTP请求，它有3个要求：超时（为了可靠性）内容最大大小（为了安全）连接池（用于性能）我检查了几乎所有python HTTP库，但没有一个满足我的要求。例如： urllib2:很好，但没有池 import urllib2 import json r = urllib2.urlopen('https://github.com/timeline.json', timeout=5) content = r.read(100+1) if l

我正在Python（2.7）中寻找一种方法来处理HTTP请求，它有3个要求：

超时（为了可靠性）
内容最大大小（为了安全）
连接池（用于性能）

我检查了几乎所有python HTTP库，但没有一个满足我的要求。例如：

urllib2:很好，但没有池

import urllib2
import json

r = urllib2.urlopen('https://github.com/timeline.json', timeout=5)
content = r.read(100+1)
if len(content) > 100: 
    print 'too large'
    r.close()
else:
    print json.loads(content)

r = urllib2.urlopen('https://github.com/timeline.json', timeout=5)
content = r.read(100000+1)
if len(content) > 100000: 
    print 'too large'
    r.close()
else:
    print json.loads(content)

请求：无最大大小

import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)
r.headers['content-length'] # does not exists for this request, and not safe
content = r.raw.read(100000+1)
print content # ARF this is gzipped, so not the real size
print json.loads(content) # content is gzipped so pretty useless
print r.json() # Does not work anymore since raw.read was used

urllib3:从未使用过“读取”方法，即使使用50Mo文件…

httplib:httplib.HTTPConnection不是池（只有一个连接）

我简直不敢相信urllib2是我能使用的最好的HTTP库！所以，如果有人知道图书馆能做什么，或者知道如何使用以前的图书馆之一

编辑：

我找到的最好的解决方案多亏了Martijn Pieters（StringIO即使对于大文件也不会慢下来，而str的添加会做很多）

您可以通过

请求很好地完成它；但是您需要知道，raw
对象是urllib3
guts的一部分，并利用它们支持的额外参数，这允许您指定要读取解码数据：
或者，您可以在读取之前在raw
对象上设置decode\u content
标志：
import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

r.raw.decode_content = True
content = r.raw.read(100000+1)
if len(content) > 100000:
    raise ValueError('Too large a response')
print content
print json.loads(content)

如果您不喜欢深入到urllib3
guts这样的地方，请使用以块的形式迭代解码内容；这也使用底层的HTTPResponse
（使用：
这里处理压缩数据大小的方式存在细微差别；r.raw.read（100000+1）
将只读取100k字节的压缩数据；未压缩数据将根据您的最大大小进行测试。iter\u content（）
方法将读取更多未压缩数据，在极少数情况下，压缩流比未压缩数据大
这两种方法都不允许r.json（）
工作；响应。_content
属性不是由它们设置的；当然可以手动设置。但是由于.raw.read（）
和.iter\u content（）
呼叫已经允许您访问相关内容，真的没有必要。
谢谢。我尝试比较哪种方法效果最好（特别是哪种方法限制了实际大小而不是下载的大小）：urllib2
不接受压缩，r.raw.read
比较gzip大小，r.iter\u内容
比较真实大小，但确实会减慢代码速度（可能流会使其更快）。@AurélienLambert:r.iter\u内容（）
减慢代码速度完全取决于读取的块的大小；较小的块大小需要更多的循环迭代。并且它已经在流上运行。content+=chunk
由于python str的不可变性而减慢了它的速度。StringIO.StringIO解决了这个问题。是的，我考虑使用一个列表，然后''.join（）
结尾，但是StringIO（）
很好地封装了它。对于任何在Python3上尝试此功能的人，请注意，在ctt.write（chunk）
行，您需要content=b'
+1。我得到一个TypeError:string参数，得到了'bytes'
import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

content = r.raw.read(100000+1, decode_content=True)
if len(content) > 100000:
    raise ValueError('Too large a response')
print content
print json.loads(content)

import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

r.raw.decode_content = True
content = r.raw.read(100000+1)
if len(content) > 100000:
    raise ValueError('Too large a response')
print content
print json.loads(content)

import requests

r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

maxsize = 100000
content = ''
for chunk in r.iter_content(2048):
    content += chunk
    if len(content) > maxsize:
        r.close()
        raise ValueError('Response too large')

print content
print json.loads(content)