如何使用url获取（python）捕获所有可能的错误？_Python_Google App Engine_Urlfetch

如何使用url获取（python）捕获所有可能的错误？

python google-app-engine

如何使用url获取（python）捕获所有可能的错误？,python,google-app-engine,urlfetch,Python,Google App Engine,Urlfetch,在我的应用程序中，用户输入url，我尝试打开链接并获取页面标题。但我意识到可能存在许多不同类型的错误，包括标题中的unicode字符或换行符以及AttributeError和IOError。我首先尝试捕捉每个错误，但现在如果出现url获取错误，我希望重定向到一个错误页面，用户将在其中手动输入标题。如何捕获所有可能的错误？这是我现在拥有的代码： title = "title" try: soup = BeautifulSoup.BeautifulSoup(url

在我的应用程序中，用户输入url，我尝试打开链接并获取页面标题。但我意识到可能存在许多不同类型的错误，包括标题中的unicode字符或换行符以及

AttributeError

和

IOError

。我首先尝试捕捉每个错误，但现在如果出现url获取错误，我希望重定向到一个错误页面，用户将在其中手动输入标题。如何捕获所有可能的错误？这是我现在拥有的代码：

    title = "title"

    try:

        soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
        title = str(soup.html.head.title.string)

        if title == "404 Not Found":
            self.redirect("/urlparseerror")
        elif title == "403 - Forbidden":
            self.redirect("/urlparseerror")     
        else:
            title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

    except UnicodeDecodeError:    
        self.redirect("/urlparseerror?error=UnicodeDecodeError")

    except AttributeError:        
        self.redirect("/urlparseerror?error=AttributeError")

    #https url:    
    except IOError:        
        self.redirect("/urlparseerror?error=IOError")


    #I tried this else clause to catch any other error
    #but it does not work
    #this is executed when none of the errors above is true:
    #
    #else:
    #    self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")

更新

正如@Wooble在我添加的评论中所建议的那样，

在将标题
写入数据库时，尝试……除了

：

        try:
            new_item = Main(
                        ....
                        title = unicode(title, "utf-8"))

            new_item.put()

        except UnicodeDecodeError:    

            self.redirect("/urlparseerror?error=UnicodeDecodeError")

这很有效。尽管根据日志记录信息，超出范围的字符

仍在标题中
：
***title: 7.2. re â€” Regular expression operations &mdash; Python v2.7.1 documentation**

您知道原因吗？
您可以使用except，而不指定任何类型来捕获所有异常
从python文档中：
最后一个exception将捕获以前未捕获的任何异常（即不属于IOError或ValueError的异常）。
您可以使用exception而不指定任何类型来捕获所有异常
从python文档中：
最后一个exception将捕获以前未捕获的任何异常（即不属于IOError或ValueError的异常）。
您可以使用顶级异常类型exception，它将捕获以前未捕获的任何异常

您可以使用顶级异常类型exception，它将捕获以前未捕获的任何异常

好啊我用最后一个except
子句更改了代码，但即使现在UnicodeDecodeError
也没有被捕获：UnicodeDecodeError:“ascii”编解码器无法解码第12位的字节0xe2：序号不在范围内（128）
（此url中有一个em破折号：http://docs.python.org/library/string.html
）我做错了什么？好的。我用最后一个except
子句更改了代码，但即使现在UnicodeDecodeError
也没有被捕获：UnicodeDecodeError:“ascii”编解码器无法解码第12位的字节0xe2：序号不在范围内（128）
（此url中有一个em破折号：http://docs.python.org/library/string.html
）我做错了什么？谢谢。但这也不能捕获unicode错误。不确定我做错了什么。@正如您在python的异常层次结构（）UnicodeDecodeError中看到的，Zeynel是异常的一个子类型，因此应该捕获它。可能是您的错误出现在代码的不同部分。@ssoler:是的，您是对的，错误发生在我试图将标题写入数据库时。标题中存在unicode错误，因此未写入。对于我来说，试图捕捉url获取错误的目的是避免处理python unicode噩梦。似乎除了

，没有办法用

try…捕获unicode错误。我不想处理unicode问题，所以我放弃了。。。这意味着用户在提交url时需要输入标题。我很惊讶，在互联网技术的这个阶段，我不能没有错误地获得一个页面的标题！！嗯，我不知道该说什么……当然你可以捕捉到unicode错误；你只需要在正确的地方做。或者你可以学习如何处理编码。“互联网技术”不会为你写好代码。@Wooble:谢谢。我尝试了title=unicode（title，“utf-8”）
，这似乎有效；虽然在logging.info
中，我看到标题中仍然有超出范围的字符。我将为问题添加更新。谢谢。但这也不能捕获unicode错误。不确定我做错了什么。@正如您在python的异常层次结构（）UnicodeDecodeError中看到的，Zeynel是异常的一个子类型，因此应该捕获它。可能是您的错误出现在代码的不同部分。@ssoler:是的，您是对的，错误发生在我试图将标题写入数据库时。标题中存在unicode错误，因此未写入。对于我来说，试图捕捉url获取错误的目的是避免处理python unicode噩梦。似乎除了

，没有办法用

try…捕获unicode错误。我不想处理unicode问题，所以我放弃了。。。这意味着用户在提交url时需要输入标题。我很惊讶，在互联网技术的这个阶段，我不能没有错误地获得一个页面的标题！！嗯，我不知道该说什么……当然你可以捕捉到unicode错误；你只需要在正确的地方做。或者你可以学习如何处理编码。“互联网技术”不会为你写好代码。@Wooble:谢谢。我尝试了title=unicode（title，“utf-8”）
，这似乎有效；虽然在logging.info中，我看到标题中仍然有超出范围的字符。我将为这个问题添加一个更新。UnicodeDecodeError几乎肯定是因为您的代码没有正确处理unicode，而不是因为用户输入了无效数据。您应该修复应用程序以处理unicode。UnicodeDecodeError几乎可以肯定是因为您的代码没有正确处理unicode，而不是因为用户输入了无效数据。您应该修复应用程序以处理unicode。
import sys

try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as (errno, strerror):
    print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise

try:

    soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
    title = str(soup.html.head.title.string)

    if title == "404 Not Found":
        self.redirect("/urlparseerror")
    elif title == "403 - Forbidden":
        self.redirect("/urlparseerror")     
    else:
        title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

except UnicodeDecodeError:    
    self.redirect("/urlparseerror?error=UnicodeDecodeError")

except AttributeError:        
    self.redirect("/urlparseerror?error=AttributeError")

#https url:    
except IOError:        
    self.redirect("/urlparseerror?error=IOError")

except Exception, ex:
    print "Exception caught: %s" % ex.__class__.__name__