Python urllib2尝试并在404上除外

Python urllib2尝试并在404上除外,python,exception,urllib2,python-2.x,Python,Exception,Urllib2,Python 2.x,我试图使用urlib2浏览一系列编号的数据页。我想做的是使用一个try语句,但我对它知之甚少,通过阅读一点判断,它似乎是基于特定的例外“名称”,例如IOError等。我不知道我要查找的错误代码是什么,这是问题的一部分 我从“urllib2缺少的手册”中编写/粘贴了我的urllib2页面获取例程,因此: def fetch_page(url,useragent) urlopen = urllib2.urlopen Request = urllib2.Request cj =

我试图使用urlib2浏览一系列编号的数据页。我想做的是使用一个try语句,但我对它知之甚少,通过阅读一点判断,它似乎是基于特定的例外“名称”,例如IOError等。我不知道我要查找的错误代码是什么,这是问题的一部分

我从“urllib2缺少的手册”中编写/粘贴了我的urllib2页面获取例程,因此:

def fetch_page(url,useragent)
    urlopen = urllib2.urlopen
    Request = urllib2.Request
    cj = cookielib.LWPCookieJar()

    txheaders =  {'User-agent' : useragent}

    if os.path.isfile(COOKIEFILE):
        cj.load(COOKIEFILE)
        print "previous cookie loaded..."
    else:
        print "no ospath to cookfile"

    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    urllib2.install_opener(opener)
    try:
        req = urllib2.Request(url, useragent)
        # create a request object

        handle = urlopen(req)
        # and open it to return a handle on the url

    except IOError, e:
        print 'Failed to open "%s".' % url
        if hasattr(e, 'code'):
            print 'We failed with error code - %s.' % e.code
        elif hasattr(e, 'reason'):
            print "The error object has the following 'reason' attribute :"
            print e.reason
            print "This usually means the server doesn't exist,",
            print "is down, or we don't have an internet connection."
            return False

    else:
        print
        if cj is None:
            print "We don't have a cookie library available - sorry."
            print "I can't show you any cookies."
        else:
            print 'These are the cookies we have received so far :'
            for index, cookie in enumerate(cj):
                print index, '  :  ', cookie
                cj.save(COOKIEFILE)           # save the cookies again

        page = handle.read()
        return (page)

def fetch_series():

  useragent="Firefox...etc."
  url="www.example.com/01.html"
  try:
    fetch_page(url,useragent)
  except [something]:
    print "failed to get page"
    sys.exit()
底部函数只是一个例子,看看我的意思,有人能告诉我应该在那里放什么吗?我让页面获取函数返回False,如果它得到404,这是正确的吗?那么为什么除了假:工作?谢谢你能给我的帮助

好的,根据这里的建议,我试过:

except urlib2.URLError, e:

except URLError, e:

except URLError:

except urllib2.IOError, e:

except IOError, e:

except IOError:

except urllib2.HTTPError, e:

except urllib2.HTTPError:

except HTTPError:

它们都不起作用。

如果要检测404:

try:
    req = urllib2.Request(url, useragent)
    # create a request object

    handle = urllib2.urlopen(req)
    # and open it to return a handle on the url
except urllib2.HTTPError, e:
    print 'We failed with error code - %s.' % e.code

    if e.code == 404:
        # do stuff..  
    else:
        # other stuff...

    return False
else:
    # ...

要在fetch_series()中捕获它,请执行以下操作:

:

异常urllib2.HTTPError

尽管是一个例外(属于
urleror
的子类),
HTTPError
可以 还可以用作非异常文件,如返回值(相同
urlopen()
返回的内容)。这在处理异国情调时非常有用 HTTP错误,例如身份验证请求

code

RFC 2616中定义的HTTP状态代码。该数值对应于在代码字典中找到的值 在
BaseHTTPServer.BaseHTTPRequestHandler.responses


我建议你看看精彩的模块

使用它,您可以实现您所要求的功能,如:

import requests
from requests.exceptions import HTTPError

try:
    r = requests.get('http://httpbin.org/status/200')
    r.raise_for_status()
except HTTPError:
    print 'Could not download page'
else:
    print r.url, 'downloaded successfully'

try:
    r = requests.get('http://httpbin.org/status/404')
    r.raise_for_status()
except HTTPError:
    print 'Could not download', r.url
else:
    print r.url, 'downloaded successfully'
互动戳: 要了解python best中此类异常的性质和可能的内容,只需以交互方式尝试键调用:

>>> f = urllib2.urlopen('http://httpbin.org/status/404')
Traceback (most recent call last):
...
  File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: NOT FOUND
然后,
sys.last_value
包含异常值,该异常值属于交互系统,可以使用:
(使用TAB+。自动扩展交互式shell,dir(),vars()…)


构建一个不会抛出HTTP错误的简单开场白:
>ho=urllib2.OpenerDirector()
>>>ho.add_处理程序(urllib2.HTTPHandler())
>>>f=ho.open('http://localhost:8080/cgi/somescript.py'); F
>>>f代码
500
>>>f.读()
'执行错误:\nNameError:未定义名称'e\'。\n\n'

urllib2.build\u opener的默认处理程序

默认_类=[ProxyHandler,UnknownHandler,HTTPHandler, HTTPDefaultErrorHandler,HTTPRedirectHandler, FTPHandler,FileHandler,HTTPErrorProcessor]


我在urllib2函数之外测试它,这有关系吗?我有点想让它成为很多事情的通用函数,然后查找它之外的错误类型。也谢谢你的帮助!好的,我试试看,谢谢,明天就要开始了。也挖掘名称;)哦,我明白了,我假设我必须测试fetch的返回值。啊,你这家伙。祝您的文件权限始终井然有序,祝您的箱子永远不会被skiddz拥有(至少):Dhmmm,似乎不起作用,只是径直走了下去。。。明天我不太累的时候再看一眼。再次非常感谢。所以你建议我重新编写整个内容并使用其他内容,或者这是对urllib2的某种附加功能?请记住,我完全是一个新手,花了我很多时间才弄明白如何下载一个页面!如果它没有坏,就不要修理它;)这是处理cookies和重定向,以及这个请求的事情吗?我太累了,我没有开始感谢你,所以很抱歉。非常感谢你花时间帮助一位兄弟。嘿,你说得对,这个模块非常酷,尽管urllib2没有坏(它现在对我有效),但我明白你说的简单是什么意思。谢谢。我之前不知道这是什么惊人的建议,差别相当惊人。有关Python 3,请参阅:
>>> f = urllib2.urlopen('http://httpbin.org/status/404')
Traceback (most recent call last):
...
  File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: NOT FOUND
>>> ev = sys.last_value
>>> ev.__class__
<class 'urllib2.HTTPError'>
>>> dir(ev)
['_HTTPError__super_init', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__getslice__', '__hash__', '__init__', '__iter__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', 'args', 'close', 'code', 'errno', 'filename', 'fileno', 'fp', 'getcode', 'geturl', 'hdrs', 'headers', 'info', 'message', 'msg', 'next', 'read', 'readline', 'readlines', 'reason', 'strerror', 'url']
>>> vars(ev)
{'fp': <addinfourl at 140193880 whose fp = <socket._fileobject object at 0x01062370>>, 'fileno': <bound method _fileobject.fileno of <socket._fileobject object at 0x01062370>>, 'code': 404, 'hdrs': <httplib.HTTPMessage instance at 0x085ADF80>, 'read': <bound method _fileobject.read of <socket._fileobject object at 0x01062370>>, 'readlines': <bound method _fileobject.readlines of <socket._fileobject object at 0x01062370>>, 'next': <bound method _fileobject.next of <socket._fileobject object at 0x01062370>>, 'headers': <httplib.HTTPMessage instance at 0x085ADF80>, '__iter__': <bound method _fileobject.__iter__ of <socket._fileobject object at 0x01062370>>, 'url': 'http://httpbin.org/status/404', 'msg': 'NOT FOUND', 'readline': <bound method _fileobject.readline of <socket._fileobject object at 0x01062370>>}
>>> sys.last_value.code
404
>>> try: f = urllib2.urlopen('http://httpbin.org/status/404')
... except urllib2.HTTPError, ev:
...     print ev, "'s error code is", ev.code
...     
HTTP Error 404: NOT FOUND 's error code is 404
>>> ho = urllib2.OpenerDirector()
>>> ho.add_handler(urllib2.HTTPHandler())
>>> f = ho.open('http://localhost:8080/cgi/somescript.py'); f
<addinfourl at 138851272 whose fp = <socket._fileobject object at 0x01062370>>
>>> f.code
500
>>> f.read()
'Execution error: <pre style="background-color:#faa">\nNameError: name \'e\' is not defined\n<pre>\n'