取消python中的慢速下载
我正在通过http下载文件,并使用urllib和以下代码显示进度-效果良好:取消python中的慢速下载,python,download,urllib,restart,Python,Download,Urllib,Restart,我正在通过http下载文件,并使用urllib和以下代码显示进度-效果良好: import sys from urllib import urlretrieve urlretrieve('http://example.com/file.zip', '/tmp/localfile', reporthook=dlProgress) def dlProgress(count, blockSize, totalSize): percent = int(count*blockSize*100/to
import sys
from urllib import urlretrieve
urlretrieve('http://example.com/file.zip', '/tmp/localfile', reporthook=dlProgress)
def dlProgress(count, blockSize, totalSize):
percent = int(count*blockSize*100/totalSize)
sys.stdout.write("\r" + "progress" + "...%d%%" % percent)
sys.stdout.flush()
现在,如果下载速度太慢(比如15秒内小于1MB),我还想重新启动下载。我怎样才能做到这一点呢?类似这样的事情:
class Timeout(Exception):
pass
def try_one(func,t=3):
def timeout_handler(signum, frame):
raise Timeout()
old_handler = signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(t) # triger alarm in 3 seconds
try:
t1=time.clock()
func()
t2=time.clock()
except Timeout:
print('{} timed out after {} seconds'.format(func.__name__,t))
return None
finally:
signal.signal(signal.SIGALRM, old_handler)
signal.alarm(0)
return t2-t1
调用“try_one”和要超时的func以及超时时间:
try_one(downloader,15)
或者,您可以这样做:
import socket
socket.setdefaulttimeout(15)
Holymackrel!使用工具
import urllib2, sys, socket, time, os
def url_tester(url = "http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz"):
file_name = url.split('/')[-1]
u = urllib2.urlopen(url,None,1) # Note the timeout to urllib2...
file_size = int(u.info().getheaders("Content-Length")[0])
print ("\nDownloading: {} Bytes: {:,}".format(file_name, file_size))
with open(file_name, 'wb') as f:
file_size_dl = 0
block_sz = 1024*4
time_outs=0
while True:
try:
buffer = u.read(block_sz)
except socket.timeout:
if time_outs > 3: # file has not had activity in max seconds...
print "\n\n\nsorry -- try back later"
os.unlink(file_name)
raise
else: # start counting time outs...
print "\nHmmm... little issue... I'll wait a couple of seconds"
time.sleep(3)
time_outs+=1
continue
if not buffer: # end of the download
sys.stdout.write('\rDone!'+' '*len(status)+'\n\n')
sys.stdout.flush()
break
file_size_dl += len(buffer)
f.write(buffer)
status = '{:20,} Bytes [{:.2%}] received'.format(file_size_dl,
file_size_dl * 1.0 / file_size)
sys.stdout.write('\r'+status)
sys.stdout.flush()
return file_name
这将按预期打印状态。如果拔下以太网电缆,我会得到:
Downloading: Python-2.7.3.tgz Bytes: 14,135,620
827,392 Bytes [5.85%] received
sorry -- try back later
如果我拔下电缆,然后在不到12秒内将其重新插入,我会得到:
Downloading: Python-2.7.3.tgz Bytes: 14,135,620
716,800 Bytes [5.07%] received
Hmmm... little issue... I'll wait a couple of seconds
Hmmm... little issue... I'll wait a couple of seconds
Done!
文件已成功下载
您可以看到它同时支持超时和重新连接。如果您断开连接并保持断开连接3*4秒==12秒,它将永久超时并引发致命异常。这也可以解决 这应该行得通。
它计算实际下载速率,如果下载速率过低,则中止
import sys
from urllib import urlretrieve
import time
url = "http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz" # 14.135.620 Byte
startTime = time.time()
class TooSlowException(Exception):
pass
def convertBToMb(bytes):
"""converts Bytes to Megabytes"""
bytes = float(bytes)
megabytes = bytes / 1048576
return megabytes
def dlProgress(count, blockSize, totalSize):
global startTime
alreadyLoaded = count*blockSize
timePassed = time.time() - startTime
transferRate = convertBToMb(alreadyLoaded) / timePassed # mbytes per second
transferRate *= 60 # mbytes per minute
percent = int(alreadyLoaded*100/totalSize)
sys.stdout.write("\r" + "progress" + "...%d%%" % percent)
sys.stdout.flush()
if transferRate < 4 and timePassed > 2: # download will be slow at the beginning, hence wait 2 seconds
print "\ndownload too slow! retrying..."
time.sleep(1) # let's not hammer the server
raise TooSlowException
def main():
try:
urlretrieve(url, '/tmp/localfile', reporthook=dlProgress)
except TooSlowException:
global startTime
startTime = time.time()
main()
if __name__ == "__main__":
main()
导入系统
从urllib导入urlretrieve
导入时间
url=”http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz“#14.135.620字节
startTime=time.time()
类ToosLoweException(异常):
通过
def convertBToMb(字节):
“”“将字节转换为兆字节”“”
字节=浮点(字节)
兆字节=字节/1048576
返回兆字节
def dlProgress(计数、块大小、总大小):
全球启动时间
alreadyLoaded=计数*块大小
timePassed=time.time()-startTime
传输速率=convertBToMb(alreadyLoaded)/时间传递#mbytes/s
传输速率*=每分钟60兆字节
百分比=整数(alreadyLoaded*100/总大小)
sys.stdout.write(“\r”+“progress”+”…%d%%%percent)
sys.stdout.flush()
如果传输速率<4且时间经过>2:#下载开始时会很慢,因此请等待2秒
打印“\n下载速度太慢!正在重试…”
时间到了。睡觉(1)#我们不要敲打服务器
提出ToosloweException
def main():
尝试:
url检索(url,'/tmp/localfile',reporthook=dlProgress)
除ToosloweException外:
全球启动时间
startTime=time.time()
main()
如果名称=“\uuuuu main\uuuuuuuu”:
main()
你可以在你的reporthook中引发一个异常。是的,快速浏览一下谷歌,引发一个异常似乎是停止下载的流行方式。但是文档中没有提到它,这让我担心它可能会有意外的行为。例如,可能数据是由专用线程获取的,抛出异常将使其成为孤立线程,而不会实际停止下载。如果您正在下载已知大小的小文件,这是一个很好的解决方案。如果您事先不知道大小,您就不知道要花多少秒才能到达
try\u one
。如果你正在下载一个100MB的文件,try\u one(downloader,1500)
在1500秒之前不会放弃。最好是在确信下载不会及时完成后立即退出。是的,同意。感谢您提供的解决方案,但我想根据最小吞吐量阈值取消,而不是根据下载是否在某个超时内完成。@HolyMackerel:只需修改您的报告挂钩,使其以10秒的间隔超时,然后检查速率。问题是挂起下载,其中0个字节被转换,并且从未调用您的报表挂钩。谢谢,这是一个很好的解决方案,但它会捕获暂停的下载,而不是缓慢的下载。请注意,这仅在连接速度减慢的情况下才起作用。除非向套接字添加超时,否则更常见的断开连接将不起作用。否则——好吧+1.