Python 在unicode字符串中转换字节字符串_Python_String_Unicode_Python 3.x_Type Conversion

Python 在unicode字符串中转换字节字符串

python string unicode python-3.x

Python 在unicode字符串中转换字节字符串,python,string,unicode,python-3.x,type-conversion,Python,String,Unicode,Python 3.x,Type Conversion,我有这样一个代码： a = "\u0432" b = u"\u0432" c = b"\u0432" d = c.decode('utf8') print(type(a), a) print(type(b), b) print(type(c), c) print(type(d), d) 和输出： <class 'str'> в <class 'str'> в <class 'bytes'> b'\\u0432' <class 'str'> \u

我有这样一个代码：

a = "\u0432"
b = u"\u0432"
c = b"\u0432"
d = c.decode('utf8')

print(type(a), a)
print(type(b), b)
print(type(c), c)
print(type(d), d)

和输出：

<class 'str'> в
<class 'str'> в
<class 'bytes'> b'\\u0432'
<class 'str'> \u0432


в
b'\\u0432'
\u0432

为什么在后一种情况下，我看到的是字符代码，而不是字符？如何将字节字符串转换为Unicode字符串，在输出时我看到的是字符，而不是其代码？

在字符串（或Python 2中的Unicode对象）中，

\u

有一个特殊的含义，即“这里有一个由Unicode ID指定的Unicode字符”。因此，

u“\u0432”

将产生字符of

b''

前缀告诉您这是一个8位字节的序列，字节对象没有Unicode字符，因此

\u

代码没有特殊意义。因此，

b“\u0432”

只是字节序列

，

和

基本上，您有一个8位字符串，其中包含的不是Unicode字符，而是Unicode字符的规范

可以使用unicode转义编码器转换此规范

>>> c.decode('unicode_escape')
'в'

我喜欢伦纳德的回答。这使我走上了正确的道路，解决了我所面临的特殊问题。我添加的是为\u？？？生成html兼容代码的能力？？？？字符串中的规范。基本上，只需要一条线路：

results = results.replace('\\u','&#x')

这一切都源于需要将JSON结果转换为在浏览器中显示良好的内容。下面是一些与云应用程序集成的测试代码：

# References:
# http://stackoverflow.com/questions/9746303/how-do-i-send-a-post-request-as-a-json
# https://docs.python.org/3/library/http.client.html
# http://docs.python-requests.org/en/v0.10.7/user/quickstart/#custom-headers
# http://stackoverflow.com/questions/606191/convert-bytes-to-a-python-string
# http://www.w3schools.com/charsets/ref_utf_punctuation.asp
# http://stackoverflow.com/questions/13837848/converting-byte-string-in-unicode-string

import urllib.request
import json

body = [ { "query": "co-development and language.name:English", "page": 1, "pageSize": 100 } ]
myurl = "https://core.ac.uk:443/api-v2/articles/search?metadata=true&fulltext=false&citations=false&similar=false&duplicate=false&urls=true&extractedUrls=false&faithfulMetadata=false&apiKey=SZYoqzk0Vx5QiEATgBPw1b842uypeXUv"
req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondatabytes = jsondata.encode('utf-8') # needs to be bytes
req.add_header('Content-Length', len(jsondatabytes))
print ('\n', jsondatabytes, '\n')
response = urllib.request.urlopen(req, jsondatabytes)
results = response.read()
results = results.decode('utf-8')
results = results.replace('\\u','&#x') # produces html hex version of \u???? unicode characters
print(results)

在使用redis集并尝试将其转换为json时遇到了这个问题。Redis返回一组字节数据。使用

unicode\u escape

非常有效