Python“；字符串“U escape”；vs"；unicode“U escape”；_Python_Encoding_Quotes_Escaping

Python“；字符串“U escape”；vs"；unicode“U escape”；

python encoding

Python“；字符串“U escape”；vs"；unicode“U escape”；,python,encoding,quotes,escaping,Python,Encoding,Quotes,Escaping,，内置字符串编码string\u escape：生成一个字符串，该字符串适合作为Python源代码中的字符串文字 …而unicode\u转义：生成一个字符串，该字符串适合作为Python源代码中的Unicode文本因此，他们应该有大致相同的行为。但是，他们对单引号的处理似乎有所不同： >>在“.”之前打印“.”\0在“.”之后打印“.”。编码（'string-escape'）在\'“\x00之后 >>>在“.”之前打印“.”\0在“.”之后打印“.”。编码（'unicode-escape

，内置字符串编码

string\u escape

：

生成一个字符串，该字符串适合作为Python源代码中的字符串文字

…而

unicode\u转义

：

生成一个字符串，该字符串适合作为Python源代码中的Unicode文本

因此，他们应该有大致相同的行为。但是，他们对单引号的处理似乎有所不同：

>>在“.”之前打印“.”\0在“.”之后打印“.”。编码（'string-escape'）
在\'“\x00之后
>>>在“.”之前打印“.”\0在“.”之后打印“.”。编码（'unicode-escape'）
在“”之前\x00之后

字符串\u转义

转义单引号，而Unicode转义不转义单引号。假设我可以简单地：

>>> escaped = my_string.encode('unicode-escape').replace("'", "\\'")

…并获得预期的行为

编辑：非常清楚地说，预期的行为是获得合适的文本。

在0范围内≤ c<128，是的，

“

是CPython 2.6的唯一区别

>>> set(unichr(c).encode('unicode_escape') for c in range(128)) - set(chr(c).encode('string_escape') for c in range(128))
set(["'"])

在此范围之外，这两种类型不可交换

>>> '\x80'.encode('string_escape')
'\\x80'
>>> '\x80'.encode('unicode_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can’t decode byte 0x80 in position 0: ordinal not in range(128)

>>> u'1'.encode('unicode_escape')
'1'
>>> u'1'.encode('string_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: escape_encode() argument 1 must be str, not unicode

>'\x80'.encode（'string\u escape'）
“\\x80”
>>>“\x80”。编码（'unicode_转义'）
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
UnicodeDecodeError:“ascii”编解码器无法解码位置0:序号不在范围（128）中的字节0x80
>>>u'1'.编码（'unicode_escape'）
'1'
>>>u'1'。编码（'string_escape'）
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
TypeError:escape_encode（）参数1必须是str，而不是unicode

在Python 3.x上，

string\u escape

编码不再存在，因为

str

只能存储Unicode。

根据我对CPython 2.6.5源代码中的

Unicode escape

和Unicode

repr

实现的解释，是的；

repr（unicode\u string）

和

unicode\u string.encode（'unicode-escape'）

之间的唯一区别是包含了换行引号和转义引号，无论使用哪个引号

它们都由相同的函数驱动，

unicodescape\u string

。此函数接受一个参数，其唯一功能是切换换行引号的添加和该引号的转义。

这只是因为“\x80”不是有效的ascii编码字符串。尝试

u'\x80'.encode（'unicode-escape'）

您会得到

'\\x80'

错误非常明确：UnicodeDecodeError:“ascii”编解码器无法解码0位置的字节0x80：序号不在范围（128）内，这意味着

ord

函数的结果，即：

ord（'\x80'）

返回128和

范围（128）[-1]

是127，所以128不在里面。