Python 3.x 特殊字符的编码和解码（拉丁语-1）_Python 3.x_Decode_Python Unicode_Unicode String

Python 3.x 特殊字符的编码和解码（拉丁语-1）

python-3.x

Python 3.x 特殊字符的编码和解码（拉丁语-1）,python-3.x,decode,python-unicode,unicode-string,Python 3.x,Decode,Python Unicode,Unicode String,我试图在HTML解析后清除一些奇怪的unicode字符，但仍然没有转换这些unicode 原文： raw = 'If further information is needed, donÂ´t hesitate to contact us. Kind regards, JosÃ© Ramirez.' 编码和解码后： text = str(raw.encode().decode('unicode_escape')) 电流输出： 'If further information is needed

我试图在HTML解析后清除一些奇怪的unicode字符，但仍然没有转换这些unicode

原文：

raw = 'If further information is needed, donÂ´t hesitate to contact us. Kind regards, JosÃ© Ramirez.'

编码和解码后：

text = str(raw.encode().decode('unicode_escape'))

电流输出：

'If further information is needed, donÃ\x82Â´t hesitate to contact us. Kind regards, JosÃ\x83Â© Ramirez'

期望输出：

'If further information is needed, don´t hesitate to contact us. Kind regards, José Ramirez'

你做得不对。

raw.encode（）.decode（'unicode_escape'）

的效果与

raw.encode（'utf-8'）.decode（'latin-1'）

的效果相同。你真正想要的是：

>>> raw.encode('latin-1').decode('utf-8')
'If further information is needed, don´t hesitate to contact us. Kind regards, José Ramirez.'

您的字符串来自接受UTF-8编码文本的人，但假设它是拉丁语-1

如果您有许多不同的Mojibake变体（不正确的文本解码，导致胡言乱语），这些软件包可以帮助您：

>>> import ftfy
>>> ftfy.fix_text('If further information is needed, donÂ´t hesitate to contact us. Kind regards, JosÃ© Ramirez.')
'If further information is needed, don´t hesitate to contact us. Kind regards, José Ramirez.'