Python 什么'；unicode（self）和self之间的区别是什么？_Python_Class_Unicode

Python 什么'；unicode（self）和self之间的区别是什么？

python class unicode

Python 什么'；unicode（self）和self之间的区别是什么？,python,class,unicode,Python,Class,Unicode,在处理unicode问题时，我发现unicode（self）和self.\uuuuuunicode\uuuuuu（）有不同的行为： #-*- coding:utf-8 -*- import sys import dis class test(): def __unicode__(self): s = u'中文' return s.encode('utf-8') def __str__(self): return self.__un

在处理unicode问题时，我发现

unicode（self）

和

self.\uuuuuunicode\uuuuuu（）

有不同的行为：

#-*- coding:utf-8 -*-
import sys
import dis
class test():
    def __unicode__(self):
        s = u'中文'
        return s.encode('utf-8')

    def __str__(self):
        return self.__unicode__()
print dis.dis(test)
a = test()
print a

上面的代码工作正常，但是如果我将

self.\uuuu unicode\uuuu（）

更改为

unicode（self）

，它将显示错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

有问题的代码是：

#-*- coding:utf-8 -*-
import sys
import dis
class test():
    def __unicode__(self):
        s = u'中文'
        return s.encode('utf-8')

    def __str__(self):
        return unicode(self)
print dis.dis(test)
a = test()
print a

我很好奇python是如何处理这个问题的，我尝试了dis模块，但没有看到太多的区别：

Disassembly of __str__:
 12           0 LOAD_FAST                0 (self)
              3 LOAD_ATTR                0 (__unicode__)
              6 CALL_FUNCTION            0
              9 RETURN_VALUE

Disassembly of __str__:
 10           0 LOAD_GLOBAL              0 (unicode)
              3 LOAD_FAST                0 (self)
              6 CALL_FUNCTION            1
              9 RETURN_VALUE

从

\uuuUnicode\uuuu

方法返回

字节
为了说明这一点：
In [18]: class Test(object):
    def __unicode__(self):
        return u'äö↓'.encode('utf-8')
    def __str__(self):
        return unicode(self)
   ....:     

In [19]: class Test2(object):
    def __unicode__(self):
        return u'äö↓'
    def __str__(self):
        return unicode(self)
   ....:     

In [20]: t = Test()

In [21]: t.__str__()
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/home/dav1d/<ipython-input-21-e2650f29e6ea> in <module>()
----> 1 t.__str__()

/home/dav1d/<ipython-input-18-8bc639cbc442> in __str__(self)
      3         return u'äö↓'.encode('utf-8')
      4     def __str__(self):
----> 5         return unicode(self)
      6 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

In [22]: unicode(t)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/home/dav1d/<ipython-input-22-716c041af66e> in <module>()
----> 1 unicode(t)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

In [23]: t2 = Test2()

In [24]: t2.__str__()
Out[24]: u'\xe4\xf6\u2193'

In [25]: str(_) # _ = last result
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
/home/dav1d/<ipython-input-25-3a1a0b74e31d> in <module>()
----> 1 str(_) # _ = last result

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)'

In [26]: unicode(t2)
Out[26]: u'\xe4\xf6\u2193'

In [27]: class Test3(object):
def __unicode__(self):
    return u'äö↓'
def __str__(self):
    return unicode(self).encode('utf-8')
....:     

In [28]: t3 = Test3()

In [29]: t3.__unicode__()
Out[29]: u'\xe4\xf6\u2193'

In [30]: t3.__str__()
Out[30]: '\xc3\xa4\xc3\xb6\xe2\x86\x93'

In [31]: print t3
äö↓

In [32]: print unicode(t3)
äö↓

[18]中的：类测试（对象）：
def ___; unicode（自）：
返回uäö↓'.编码（'utf-8'）
定义（自我）：
返回unicode（自身）
....:     
在[19]中：类Test2（对象）：
def ___; unicode（自）：
返回uäö↓'
定义（自我）：
返回unicode（自身）
....:     
In[20]：t=Test（）
在[21]中：t._u str___u_____（）
---------------------------------------------------------------------------
UnicodeDecodeError回溯（最近一次呼叫最后一次）
/home/dav1d/in（）
---->1 t.uuu str_uuuu（）
/home/dav1d/in\uuuuu str\uuuuuuuuuu（自我）
3返回uäö↓'.编码（'utf-8'）
4定义（自我）：
---->5返回unicode（自身）
6.
UnicodeDecodeError:“ascii”编解码器无法解码位置0中的字节0xc3:序号不在范围内（128）
In[22]：unicode（t）
---------------------------------------------------------------------------
UnicodeDecodeError回溯（最近一次呼叫最后一次）
/home/dav1d/in（）
---->1 unicode（t）
UnicodeDecodeError:“ascii”编解码器无法解码位置0中的字节0xc3:序号不在范围内（128）
在[23]中：t2=Test2（）
在[24]中：t2.\uuuu str\uuuuuuuuuuuuuuu（）
Out[24]：u'\xe4\xf6\u2193'
[25]中：str（）#=最后结果
---------------------------------------------------------------------------
UnicodeEncodeError回溯（最近一次呼叫最后一次）
/home/dav1d/in（）
---->1 str（）#=最后结果
UnicodeEncodeError:“ascii”编解码器无法对位置0-2处的字符进行编码：序号不在范围（128）内
In[26]：unicode（t2）
Out[26]：u'\xe4\xf6\u2193'
在[27]中：类Test3（对象）：
def ___; unicode（自）：
返回uäö↓'
定义（自我）：
返回unicode（self）.encode（'utf-8'）
....:     
In[28]：t3=Test3（）
在[29]：t3.\uuuu unicode\uuuuu（）
Out[29]：u'\xe4\xf6\u2193'
在[30]：t3.\uuuu str\uuuuuu（）
输出[30]：“\xc3\xa4\xc3\xb6\xe2\x86\x93”
In[31]：打印t3
äö↓
In[32]：打印unicode（t3）
äö↓

print a
或者在我的例子中，print t t
将调用t.\uu str\uuu
，它将返回bytes
您让它返回unicode
，因此它尝试使用ascii
对其进行编码，但这不起作用
简单修复：让\uuuuunicode\uuuuuuuuuuuuuuuuuu
返回unicode和\uuuuuuuuuuu
字节。
在Python对象上调用unicode
时，输出是传递给unicode
方法的参数的unicode表示形式
由于您没有指定应该使用什么编码，因此会出现一个错误，即参数不能仅使用ASCII表示
当您使用\uuuuu unicode\uuuuu
时，您指定应该使用utf-8对该字符串进行编码，这是正确的，并且不会出现任何问题
您可以使用所需的编码作为unicode
方法的第二个参数，例如：
unicode( str, "utf-8" )

这应该和你的\uuuuuuunicode\uuuu
方法的工作方式相同。
当你定义\uuuuunicode\uuu
特殊方法时，你告诉它要使用什么编码。当您只调用unicode
时，您没有指定编码，因此Python使用默认的“ascii”
顺便说一句，\uuu str\uuu
应该返回字节字符串，而不是unicode。和\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。所以这个代码是反向的。由于它没有返回unicode，Python可能正在尝试使用默认编码对其进行转换
s = u'中文'
return s.encode('utf-8')

这将返回一个非Unicode字节字符串。这就是encode
正在做的事情。utf-8并不是一种神奇地将数据转换成Unicode的东西；如果有什么不同的话，那就是相反的——一种用字节（或多或少的数据）表示Unicode（抽象）的方法
我们需要一些术语。对进行编码就是采用某种编码方式，获取一个Unicode字符串并生成一个表示该字符串的字节字符串。解码则相反：获取字节字符串（我们认为它编码的是Unicode字符串），并使用指定的编码将其解释为Unicode字符串
当我们编码为一个字节字符串，然后使用相同的编码进行解码时，我们得到了原始的Unicode
utf-8
是一种可能的编码方式。还有很多很多
有时，当您调用encode
时，Python会报告一个UnicodeDecodeError
。为什么？因为您试图对字节字符串进行编码。这个过程的正确输入是Unicode字符串，因此Python“有益地”尝试先将字节字符串解码为Unicode。但是它不知道使用什么编解码器，所以它假设ascii
。在您可以接收各种数据的环境中，此编解码器是最安全的选择。它只是报告一个大于等于128字节的错误，这些字节在各种8位编码中以无数种不同的方式处理。（还记得当年从Mac电脑向PC电脑导入字母为é
的Word文件或从PC电脑导入字母为的Word文件吗？在另一台电脑上，你会看到一些奇怪的符号，因为平台内置的编码是不同的。）
使事情更加复杂的是，在Python2中，encode
/decode机制还用于实现一些与解释Unicode无关的其他整洁的事情。例如，有一个Base64编码器和一个自动处理字符串转义序列的东西(
class test():
    def __unicode__(self):
        return u'中文'

    def __str__(self):
        return unicode(self).encode('utf-8')