Python 如何在urllib.urlretrieve中捕获404错误_Python_Http_Url_Urllib

Python 如何在urllib.urlretrieve中捕获404错误

python http url

Python 如何在urllib.urlretrieve中捕获404错误,python,http,url,urllib,Python,Http,Url,Urllib,背景：与urllib*模块中的任何其他函数不同，我使用的是钩子函数支持（请参见下面的reporthook）。。用于显示文本进度条的。这是Python>=2.6 >>> urllib.urlretrieve(url[, filename[, reporthook[, data]]]) 但是，urlretrieve非常愚蠢，无法检测HTTP请求的状态（例如：是404还是200？）下载远程HTTP文件的最有名的方法是什么，它支持钩子（显示进度条）和适当的HTTP错误处理？请查看u

背景：与

urllib*

模块中的任何其他函数不同，我使用的是钩子函数支持（请参见下面的

reporthook

）。。用于显示文本进度条的。这是Python>=2.6

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])

但是，

urlretrieve

非常愚蠢，无法检测HTTP请求的状态（例如：是404还是200？）

下载远程HTTP文件的最有名的方法是什么，它支持钩子（显示进度条）和适当的HTTP错误处理？

请查看

urllib.urlretrieve

的完整代码：

def urlretrieve(url, filename=None, reporthook=None, data=None):
  global _urlopener
  if not _urlopener:
    _urlopener = FancyURLopener()
  return _urlopener.retrieve(url, filename, reporthook, data)

换句话说，您可以使用（它是公共URLLIBAPI的一部分）。您可以覆盖http\u error\u default来检测404：

class MyURLopener(urllib.FancyURLopener):
  def http_error_default(self, url, fp, errcode, errmsg, headers):
    # handle errors the way you'd like to

fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)

URL开启器对象的“retrieve”方法支持reporthook，并在404上引发异常

您应该使用：

import urllib2

try:
    resp = urllib2.urlopen("http://www.google.com/this-gives-a-404/")
except urllib2.URLError, e:
    if not hasattr(e, "code"):
        raise
    resp = e

print "Gave", resp.code, resp.msg
print "=" * 80
print resp.read(80)

编辑：这里的基本原理是，除非您期望异常状态发生，否则它是一个异常，您可能甚至没有考虑它——因此，在代码不成功时，不让代码继续运行，默认行为是——相当合理地——禁止其执行。

我不想指定处理程序；它会抛出像urllib2.urlopen这样的异常吗？很容易让它抛出。FancyURLopener子类URLopener，URLopener会抛出，因此您可以尝试调用基类的实现：def http_error_default（…）：URLopener.http_error_default（…）您应该先执行opener=MyURLopener（），然后执行opener.retrieve（），以保持opener对象的活动状态。否则（如果您在一行上完成所有操作），新创建的opener将在检索操作之后立即解除分配。这将在您有机会使用之前删除数据下载到的临时文件。不在您的请求中提供HTTP状态可能会被视为stdlib中的一个错误（但请查看下面更好的库，requests），这太愚蠢了，urlretrieve无法使用返回状态处理此问题

import urllib2

try:
    resp = urllib2.urlopen("http://www.google.com/this-gives-a-404/")
except urllib2.URLError, e:
    if not hasattr(e, "code"):
        raise
    resp = e

print "Gave", resp.code, resp.msg
print "=" * 80
print resp.read(80)