Python2:str.encode（）和unicode.decode（）做什么？_Python_Python 2.7_Unicode_Utf 8

Python2:str.encode（）和unicode.decode（）做什么？

python python-2.7 unicode utf-8

Python2:str.encode（）和unicode.decode（）做什么？,python,python-2.7,unicode,utf-8,Python,Python 2.7,Unicode,Utf 8,从这个问题及其答案中，我了解到unicode.encode（）为您提供str，str.decode（）为您提供unicode： a = 'à' ua = u'à' print type(a) # str print type(ua) # unicode print ua.encode('utf-8') == a # True print a.decode('utf-8') == ua # True 但是我不明白unicode.decode（）和str.encode（）方法的用途。他们应

从这个问题及其答案中，我了解到

unicode.encode（）

为您提供

str

，

str.decode（）

为您提供

unicode

：

a = 'à'
ua = u'à'
print type(a)  # str
print type(ua)  # unicode
print ua.encode('utf-8') == a  # True
print a.decode('utf-8') == ua  # True

但是我不明白

unicode.decode（）

和

str.encode（）方法的用途。他们应该返回什么？我如何使用它们？以下两行都出现故障，出现unicodecodeerror
或unicodecodeerror
：
print ua.decode('utf-8')
print a.encode('utf-8')

TL；DR使用unicode.decode
和str.encode
表示您没有使用正确的类型来表示数据。Python 3中的等效类型上的方法甚至不存在

unicode
值是单个unicode代码点：解释为特定字符的整数。另一方面，str是一个字节序列
例如，是Unicode代码点U+00E0。UTF-8编码用一对字节0xC3和0xA0表示它
unicode.encode
方法接受一个unicode字符串（一系列代码点），并将每个代码点的字节级编码作为单个字节字符串返回
>>> ua.encode('utf-8')
'\xc3\xa0'

str.decode
获取字节字符串并尝试生成等效的Unicode值
>>> '\xc3\xa0'.decode('utf-8')
u'\xe0'

（u'\xe0'
相当于u'a'
）

至于您的错误：Python2没有严格区分unicode
和str
的使用方式。如果一个str
已经是一个编码值，那么对它进行编码是没有意义的，而对一个unicode
值进行解码是没有意义的，因为它一开始就没有编码。我不想详细分析错误是如何发生的，我只想指出，在Python 3中，有两种类型：bytes
是一个字节字符串（对应于Python 2str
），而str
是一个Unicode字符串（对应于Python 2Unicode
）。Python 3中甚至不存在“荒谬”的方法：
>>> bytes.encode
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'bytes' has no attribute 'encode'
>>> str.decode
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'str' has no attribute 'decode'

>bytes.encode
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
AttributeError:类型对象“bytes”没有属性“encode”
>>>str.decode
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
AttributeError:类型对象“str”没有属性“decode”

因此，您以前引发Unicode*错误
异常的尝试只会引发AttributeError

如果您一直无法支持Python 2，请遵循以下规则：

unicode
用于文本
str
用于二进制数据
unicode.encode
生成一个str
值
str.decode
生成一个unicode
值
如果您发现自己试图调用str.encode
，则使用了错误的类型
如果您发现自己试图调用unicode.decode
，则说明您使用了错误的类型
这些方法（在Python3中被删除是有充分理由的）来自这样一个事实，即str
和unicode
之间存在自动强制。在Python2中，这两种类型具有相同的接口。调用str.encode
时，str
对象首先在引擎盖下强制为unicode
，然后编码为str
。与unicode类似。解码