如何在python2中不转义unicode？_Python_Unicode

如何在python2中不转义unicode？

python unicode

如何在python2中不转义unicode？,python,unicode,Python,Unicode,有没有办法在Python2的repr（）中重现py3的而不是转义Unicode $ python3 >>> s="…\n…" >>> print(repr(s)) '…\n…' 但是我想要 u'…\n…' 我想出的解决办法是 #!/usr/bin/python import re _uregex=re.compile("\\\\([^uU])") def _ureplace(x): x = x.group(1) if x == "\\

有没有办法在Python2的repr（）中重现py3的而不是转义Unicode

$ python3 >>> s="…\n…" >>> print(repr(s)) '…\n…'
但是
我想要

u'…\n…'
我想出的解决办法是

#!/usr/bin/python import re _uregex=re.compile("\\\\([^uU])") def _ureplace(x): x = x.group(1) if x == "\\": return "\\\\\\\\" # Eight of them. Required. return "\\\\"+x def urepr(x): return _uregex.sub(_ureplace,repr(x)).decode("unicode-escape") s = u"\u2026\n\u2026" print(urepr(s))
但我想知道是否有更好的方法来做到这一点——逃避一切，只是为了逃避一切，似乎相当浪费。而且速度很慢（我需要它来快速地将大量大型对象的repr写入日志文件）。
尝试这样做

repr(string).decode("utf-8")

我不认为Python2提供了这样做的方法，但是编写自己的代码很容易

import unicodedata def unichr_repr(ch): if ch == '\\': return '\\\\' elif ch == "'": return "\\'" category = unicodedata.category(ch) if category == 'Cc': if ch == '\n': return '\\n' n = ord(ch) if n < 0x100: return '\\x%02x' % n if n < 0x10000: return '\\u%04x' % n return '\\U%08x' % n return ch def unistr_repr(s): return "'" + ''.join(unichr_repr(ch) for ch in s) + "'"

导入Unicode数据 def unichr_repr（ch）：如果ch='\\'：返回'\\\\' elif ch==“'”：返回“\\” 类别=Unicode数据。类别（ch）如果类别==“Cc”：如果ch='\n': 返回'\\n' n=ord（ch）如果n<0x100：返回'\\x%02x'%n 如果n<0x10000：返回'\\u%04x'%n 返回'\\U%08x'%n 返回ch def unistr_报告：返回“'”+“”.连接（用于连接的unichr_repr（ch）+””
这里有一个更完整的解决方案，也适用于unicode字符串列表：

import reprlib import sys class URepr(reprlib.Repr): """ On python 3, repr returns unicode objects, which means that non-ASCII characters are rendered in human readable form. This provides a similar facility on python 2. Additionally, on python 3, it prefixes unicode repr with a u, such that the returned repr is a valid unicode literal on both python 2 and python 3 """ # From https://github.com/python/cpython/blob/3.6/Objects/unicodectype.c#L147-L1599 nonprintable_categories = ('Cc', 'Cf', 'Cs', 'Co', 'Cn', 'Zl', 'Zp', 'Zs') if sys.version_info.major >= 3: def repr_str(self, obj, level): return 'u' + super().repr_str(obj, level) else: def repr_unicode(self, obj, level): def _escape(ch): # printable characters that have special meanings in literals if ch == u'\\': return u'\\\\' elif ch == u"'": return u"\\'" # non-printable characters - convert to \x.., \u...., \U........ category = unicodedata.category(ch) if category in self.nonprintable_categories: return ch.encode('unicode-escape').decode('ascii') # everything else return ch return u"u'{}'".format(''.join(_escape(c) for c in obj))
可用作：

repr = URepr().repr repr([u'hello', u'world'])

虽然我知道如果您将Unicode作为数字值接收，您希望使用您的方法，但我可以建议使用函数
chr（）
。
这样做的目的是减少IO时间吗？您误用了
repr（）
repr
旨在提供对象的可打印版本。它不关心可读性。@Leonardo.Z Python3版本确实关心可读性。我也是。如果你看到的只是一堆\uABCD，那么为非ASCII字符串（比如日文文件名）编辑日志就不好玩了。@user2357112这样做的目的是为了获取人类可读和机器可读的日志条目。我的代码可以工作，但它非常难看（在此之前，我从来没有连续写过八个反斜杠）。以更好的方式实现这一点会产生一个受欢迎的副作用，那就是加快速度。？？？这对我有什么帮助？Python2中的repr（）返回一个ASCII字符串。解码（“utf-8”）是ASCII上的不可操作的。说到Unicode类别，还有regex模块……这将大大加快此解决方案的速度。实际上，它太慢了。看我的答案-
Cc
不够
repr = URepr().repr repr([u'hello', u'world'])