还有一个带有重音字符的编码问题（用Python和BeautifulSoup删除网站）_Python_Unicode_Utf 8_Character Encoding_Beautifulsoup

还有一个带有重音字符的编码问题（用Python和BeautifulSoup删除网站）

python unicode utf-8 character-encoding

还有一个带有重音字符的编码问题（用Python和BeautifulSoup删除网站）,python,unicode,utf-8,character-encoding,beautifulsoup,Python,Unicode,Utf 8,Character Encoding,Beautifulsoup,（前言：我知道，这个问题已经讲了一百遍了，但我还是不明白）我试图加载html页面并输出文本，尽管我正确获取了网页，但BeautifulSoup以某种方式破坏了重音字符的编码，这些字符不是前127个ASCII字符的一部分： # -*- coding: utf-8 -*- import sys from urllib import urlencode from urlparse import parse_qsl import re import urlparse import json impor

（前言：我知道，这个问题已经讲了一百遍了，但我还是不明白）

我试图加载html页面并输出文本，尽管我正确获取了网页，但BeautifulSoup以某种方式破坏了重音字符的编码，这些字符不是前127个ASCII字符的一部分：

# -*- coding: utf-8 -*-
import sys
from urllib import urlencode
from urlparse import parse_qsl
import re
import urlparse
import json
import urllib
from bs4 import BeautifulSoup

url = "http://www.rtve.es/alacarta/interno/contenttable.shtml?ctx=29010&locale=es&module=&orderCriteria=DESC&pageSize=15&mode=TEXT&seasonFilter=40015"
html=urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
div = soup.find_all("span", class_="detalle")
capitulo_detalle = div[0].text   (doesn't work, capitulo_detalle is type str with utf-8, div[0].tex is type unicode)

div[0]的输出。文本应类似于：
SA在埃斯图拉和加比河畔，在斯佩拉多的维亚杰河上，坎比·德·伦博河（cambie de rumbo）。胡安·斯格恩·普雷索斯酒店。电子商务的实现Á吉拉·罗亚·蒂恩
但我得到的结果是：
你在海岸和加比河畔的海岸线，与其他公司合作
维亚杰·伊内斯帕拉多·哈尔克·坎比·德·伦博。胡安·斯格恩·普雷索斯酒店
. 电子商务的实现\xc1guila Roja tiene…'
-->要获得“正确”的字符，我需要做哪些更改？
我知道这一定是这些问题的重复，但这里的答案似乎不起作用：

我还阅读了有关unicode、utf-8、ascii的典型文档，例如，显然没有成功
import requests
from bs4 import BeautifulSoup

url = "http://www.rtve.es/alacarta/interno/contenttable.shtml?ctx=29010&locale=es&module=&orderCriteria=DESC&pageSize=15&mode=TEXT&seasonFilter=40015"
html=requests.get(url)
soup = BeautifulSoup(html.text, 'lxml')
div = soup.find("span", class_="detalle")
capitulo_detalle = div.text 

输出：
使用请求
和python3
，问题永远不会出现
我相信我终于解决了
>>> div = soup.find("span", class_="detalle")
>>> div.text
u'S\xe1tur se dirige al sur en busca de Estuarda y Gabi, pero

--->这是unicode，\xe1是“á”的“代码”（）
--->“print”正确计算unicode代码点
>>> div.text.encode('utf-8')
'S\xc3\xa1tur se dirige al sur en busca de Estuarda y Gabi, pero

--->根据上述url上给出的表格，Unicode编码为utf-8。我不明白为什么输出显示为\xc3\xa1而不是“á”
>>> print div.text.encode('utf-8')
S├ítur se dirige al sur en busca de Estuarda y Gabi, pero

--->我不明白为什么print现在将其评估为一个奇怪的符号
>>> blurr = div.text.encode('cp850')
>>> blurr
'S\xa0tur se dirige al sur en busca de Estuarda y Gabi, pero
>>> type(blurr)
<type 'str'>

--->最后，它是对的
在Kodi中，我可以使用utf-8表示法，例如，字符“á”作为\xc3\xa1保存在变量中，但是当变量的内容以“xbmgui.Dialog（）.ok（addonname，blur）”等形式显示时，它会以“á”的形式正确显示在屏幕上
Und sowas soll man wissen……
如何运行此代码？在Python Shell或Python script.py
？如何获取此文本？是否使用print div[0]。text
或Python Shell自动为您打印此文本？您有正确的文本，但Python Shell使用print repr（div[0].text）
显示用于调试的文本。因此请尝试打印报告（div[0].text）
和打印div[0].text
您将看到不同的文本。我使用的是Python 2.7.13，示例可以在Shell中运行，也可以作为脚本运行，这无关紧要。是的，'print'显示正确的输出，但我需要变量中的文本。您已经有了正确的字符。文本u的\xe1tur se dirige…'
准确地表示文本Sátur se dirige…
。如果您print（）
it，您将看到原始字符（假设您的控制台可以打印它们，如果是Windows，则可能无法打印）。@bobince:是的，但我使用的是Python 2.7.13和utf-8。如果我将div[0].text（unicode）分配给一个普通字符串变量（utf-8），我遇到了麻烦。只使用unicode
，您就不会有问题了-所以将所有字符串转换为unicode。这就是解决方案。我还尝试了“requests”示例，它不会改变任何东西。好吧，python3预先处理unicode，但现在我开始使用python 2.7.13，我不想更改整个代码，看起来我不知道Kodi是否支持python3。
>>> print div.text.encode('utf-8')
S├ítur se dirige al sur en busca de Estuarda y Gabi, pero

>>> blurr = div.text.encode('cp850')
>>> blurr
'S\xa0tur se dirige al sur en busca de Estuarda y Gabi, pero
>>> type(blurr)
<type 'str'>

>>> print(blurr)
Sátur se dirige al sur en busca de Estuarda y Gabi, pero