Python 如何解码u'\xc3\xa9cosyst\xc3\xa8mes'；至utf-8_Python_Python 2.7_Encoding_Escaping

Python 如何解码u'\xc3\xa9cosyst\xc3\xa8mes'；至utf-8

python python-2.7 encoding

Python 如何解码u'\xc3\xa9cosyst\xc3\xa8mes'；至utf-8,python,python-2.7,encoding,escaping,Python,Python 2.7,Encoding,Escaping,通过使用BeautifulSoup进行webscraping，我得到了一个查询字符串参数，该参数最终表示为： param_value = u'\xc3\xa9cosyst\xc3\xa8mes' 在阅读时，我可以猜测它应该表示为écosytèmes 我尝试了几种编码/转义/解码的方法（如所述和）但我不断地犯这样的错误： UnicodeEncodeError('ascii', u'\xc3\xa9cosyst\xc3\xa8mes', 0, 2, 'ordinal not in range(1

通过使用BeautifulSoup进行webscraping，我得到了一个查询字符串参数，该参数最终表示为：

param_value = u'\xc3\xa9cosyst\xc3\xa8mes'

在阅读时，我可以猜测它应该表示为

écosytèmes

我尝试了几种编码/转义/解码的方法（如所述和）

但我不断地犯这样的错误：

UnicodeEncodeError('ascii', u'\xc3\xa9cosyst\xc3\xa8mes', 0, 2, 'ordinal not in range(128)')

我还尝试了重复的解决方案：

Python 2.7.15 (default, Jul 23 2018, 21:27:06)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = u'\xc3\xa9cosyst\xc3\xa8mes'
>>> s.encode('latin-1').decode('utf-8')
u'\xe9cosyst\xe8mes'

但这让我回到了第一步

如何从

u'\xc3\xa9cosyst\xc3\xa8mes'

到

u'écosystèmes'

？

您将UTF-8解码为拉丁-1，因此解决方案是将其编码为拉丁-1，然后解码为UTF-8

>>> s = u'\xc3\xa9cosyst\xc3\xa8mes'
>>> s.encode('latin-1').decode('utf-8')
u'\xe9cosyst\xe8mes'
>>> print s.encode('latin-1').decode('utf-8')
écosystèmes

我认为这会有帮助：

字节（u'\xc3\xa9cosyst\xc3\xa8mes'，'latin-1'）。解码（'utf-8'）

相关：。您所拥有的看起来像是解码为拉丁语1的UTF-8。

u'\xe9cosyst\xe8mes'

是正确的unicode字符串值。如果你能想出如何将问题中的

u'

字符串转换成这个

b'

bytestring，那么你现在应该简单地阅读它；但这显然是这个问题的关键所在。比如：

字节（u'\xc3\xa9cosyst\xc3\xa8mes'，拉丁语-1'）。解码（'utf-8'）

现在应该可以了，它只会让我回到第1步…`>>s=u'\xc3\xa9cosyst\xc3\xa8mes'>>s.encode（'latin-1'）”\xc3\xa9cosyst\xc3\xa8mes'>>s.encode（'latin-1'）。decode（'utf-8'）u'\xe9cosyst\xe8mes'``这不是方的-这是解决方案。

repr