Python 将字符串从Latin-1转换为UTF-8，然后再转换回Latin-1_Python

Python 将字符串从Latin-1转换为UTF-8，然后再转换回Latin-1

python

Python 将字符串从Latin-1转换为UTF-8，然后再转换回Latin-1,python,Python,一个系统（不在我的控制下）发送一个latin-1编码字符串（如Öland），我可以将其转换为utf-8，但不能返回到latin-1 考虑以下代码： text = '\xc3\x96land' # This is what the external system sends iso = text.encode(encoding='latin-1') # this is my best guess print(iso.decode('utf-8')) print(u"Öland".encode(en

一个系统（不在我的控制下）发送一个

latin-1

编码字符串（如Öland），我可以将其转换为

utf-8

，但不能返回到

latin-1

考虑以下代码：

text = '\xc3\x96land' # This is what the external system sends
iso = text.encode(encoding='latin-1') # this is my best guess
print(iso.decode('utf-8'))
print(u"Öland".encode(encoding='latin-1'))

这是输出：

Öland b'\xd6land' Öland b'\xd6land' 现在，我如何模拟这个系统？

显然

'\xc3\x96land'

不是

'\xd6land'

如果您的外部系统将其发送给您，那么您应该首先对其进行解码，而不是编码，因为它是以编码方式发送的

您不必对已编码的数据进行编码

hey=u“Öland”。编码（'latin-1'）
打印嘿

给出如下输出

？land

print.decode（'latin-1'）

给出这样的输出结果表明外部系统已经在utf-8中发送了数据。现在，前后转换字符串的工作方式如下：

#!/usr/bin/env python3.4
# -*- coding: utf-8 -*-

text = '\xc3\x96land'
encoded = text.encode(encoding='raw_unicode_escape')
print(encoded)
utf8 = encoded.decode('utf-8')
print(utf8)

mimic = utf8.encode('utf-8', 'unicode_escape')
print(mimic)

以及输出

b'\xc3\x96land' Öland b'\xc3\x96land' b'\xc3\x96land' Öland b'\xc3\x96land'

谢谢你的支持

你确定输入是拉丁文1吗？“Ö”不需要两个字节来编码。实际上0xD6看起来是对的：老实说，不是。拉丁语-1只是我最好的猜测。也许你的输入已经是UTF-8了

Öc3 96带分音符的拉丁文大写字母O

@规程：不是拉丁文-1，他们不是。在UTF-8中，是的，Ö是0xC396<代码>文本已经是Unicode格式；然后将其编码为拉丁语1，得到拉丁语1中的

Ã\x96land

；您将其解码为UTF-8，即

Öland

现在为Unicode文本'\xc3\x96land'毫无意义。