在Python 3中将Unicode序列转换为字符串_Python_Python 3.x_String_Unicode_Python 3.4

在Python 3中将Unicode序列转换为字符串

python python-3.x string unicode

在Python 3中将Unicode序列转换为字符串,python,python-3.x,string,unicode,python-3.4,Python,Python 3.x,String,Unicode,Python 3.4,在Bash CLI中使用Kubuntu 15.10上的Python 3.4解析HTML响应以提取数据时，使用print（）我得到如下输出： \u05ea\u05d4 \u05e0\u05e9\u05de\u05e2 \u05de\u05e6\u05d5\u05d9\u05df 如何在应用程序中输出实际文本本身？这是生成字符串的代码： response = requests.get(url) messages = json.loads( extract_json(response.text)

在Bash CLI中使用Kubuntu 15.10上的Python 3.4解析HTML响应以提取数据时，使用

print（）

我得到如下输出：

\u05ea\u05d4 \u05e0\u05e9\u05de\u05e2 \u05de\u05e6\u05d5\u05d9\u05df

如何在应用程序中输出实际文本本身？

这是生成字符串的代码：

response = requests.get(url)
messages = json.loads( extract_json(response.text) )

for k,v in messages.items():
    for message in v['foo']['bar']:
        print("\nFoobar: %s" % (message['body'],))

以下是从HTML页面返回JSON的函数：

def extract_json(input_):

    """
    Get the JSON out of a webpage.
    The line of interest looks like this:
    foobar = ["{\"name\":\"dotan\",\"age\":38}"]
    """

    for line in input_.split('\n'):
        if 'foobar' in line:
            return line[line.find('"')+1:-2].replace(r'\"',r'"')

    return None

在谷歌上搜索这个问题时，我发现了与Python 3相关的问题，但它彻底改变了Python中处理字符串，尤其是Unicode的方式

如何将示例字符串（
\u05ea
）转换为Python 3中的字符（
ת
）
附录：
以下是有关
消息['body']
的一些信息：

print(type(message['body'])) # Prints: <class 'str'> print(message['body']) # Prints: \u05ea\u05d4 \u05e0\u05e9\u05de\u05e2 \u05de\u05e6\u05d5\u05d9\u05df print(repr(message['body'])) # Prints: '\\u05ea\u05d4 \\u05e0\\u05e9\\u05de\\u05e2 \\u05de\\u05e6\\u05d5\\u05d9\\u05df' print(message['body'].encode().decode()) # Prints: \u05ea\u05d4 \u05e0\u05e9\u05de\u05e2 \u05de\u05e6\u05d5\u05d9\u05df print(message['body'].encode().decode('unicode-escape')) # Prints: תה נשמע מצוין

打印（类型（消息['body']）） #印刷品：打印（消息['body']） #打印：\u05ea\u05d4\u05e0\u05e9\u05de\u05e2\u05de\u05e6\u05d5\u05d9\u05df 打印（repr（消息['body']）） #打印：'\\u05ea\u05d4\\u05e0\\u05e9\\u05de\\u05e2\\u05de\\u05e6\\u05d5\\u05d9\\u05df' 打印（消息['body'].encode（）.decode（）） #打印：\u05ea\u05d4\u05e0\u05e9\u05de\u05e2\u05de\u05e6\u05d5\u05d9\u05df 打印（消息['body'].encode（）.decode（'unicode-escape'）） #印刷品：תנשעמצוין
请注意，最后一行确实按预期工作，但存在一些问题：

使用unicode转义对字符串文字进行解码是错误的，因为对于许多字符，Python转义与JSON转义是不同的。（谢谢）

encode（）
依赖默认编码，这是一件坏事。（谢谢）

encode（）
如果您的输入使用反斜杠作为转义字符，您应该在将文本传递给json 之前取消转义： >>> foobar = '{\\"body\\": \\"\\\\u05e9\\"}' >>> import re >>> json_text = re.sub(r'\\(.)', r'\1', foobar) # unescape >>> import json >>> print(json.loads(json_text)['body']) ש 不要在JSON文本上使用'unicode-escape' 编码；它可能会产生不同的结果：导入json >>>json_text='[“\\ud83d\\ude02”] >>>加载（json_文本） ['what isprint（ascii（message['body']）？不相关：使用messages=response.json（）。如果输入不是json，那么它是什么？print（response.content[:50]）；print（response.headers['content-Type'））。你能更改服务返回的上游格式吗？这不是我所要求的。按原样运行注释中的代码。@J.F.Sebastian:b'\r\n\n\n现在我们有进展了。你能发布用于获取消息的真实代码吗？（在请求之间。get（）和json.loads（）包括在内）非常感谢！非常感谢您耐心地在评论中找到问题的根源。@dotancohen我不确定我是否理解这个问题的实际答案。要将unicode序列转换为字符串表示，我们必须使用JSON？是吗？没有编码/解码技巧来解决此问题？添加ally，这与s.encode（'utf-8'）.decode（'unicode-escape'）有什么不同吗？很难用一条评论来解释我的目的，所以请看。@BramVanroy[重新发布评论以修复打字错误]不需要。如果您已经有一个纯Unicode文本，那么您不需要对它做任何事情。如果您有一个JSON格式的Unicode文本，那么只需使用result=JSON.loads（JSON_text）。如果您有一个乱码输入，请尝试在上游修复它；如果您不能，请使用任何必要的方法修复您特定的乱码输入。请注意：'\u2603' 和r'\u2603' 在Python中是完全不同的东西（您的问题表明您没有看到区别）。