python解析html页面：如何解码&xffd；烧焦_Python_Parsing_Utf 8

python解析html页面：如何解码&xffd；烧焦

python parsing utf-8

python解析html页面：如何解码&xffd；烧焦,python,parsing,utf-8,Python,Parsing,Utf 8,我正在尝试解析这样的HTML页面 # coding: utf8 [...] def search(self, a, b): word = self.champ_rech_canal.get_text() url_canal = "http://www.canalplus.fr/pid3330-c-recherche.html?rechercherSite=" + mot_canal try: f = urllib.urlopen(url_canal)

我正在尝试解析这样的HTML页面

# coding: utf8
[...]
def search(self, a, b):
    word = self.champ_rech_canal.get_text()
    url_canal = "http://www.canalplus.fr/pid3330-c-recherche.html?rechercherSite=" + mot_canal
    try:
       f = urllib.urlopen(url_canal)
       self.feuille_canal = f.read()
       f.close()
    except: 
       self.champ_rech_canal.set_text("La recherche a échoué")
       pass
    print self.feuille_canal

结果很好，我也有� 作为“é”或“ô” 我怎样才能破译它？尝试：

结果:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 8789: invalid continuation byte

您正在尝试将ISO-8859-1页面解码为UTF-8，但无法工作。请参阅返回的HTML中的内容标题：

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

谢谢您的回答。是的，HTML代码中有这一行。我可以试着替换它吗？只需使用

“iso-8859-1”

进行解码即可。如果你改变文本，文本的编码不会神奇地改变。是的，Matthias是正确的，你必须用网站给出的编码解码。

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />