默认情况下，让python用字符串替换不可编码的字符_Python_Replace_Encode

默认情况下，让python用字符串替换不可编码的字符

python replace

默认情况下，让python用字符串替换不可编码的字符,python,replace,encode,Python,Replace,Encode,我想让python忽略它无法编码的字符，只需将它们替换为字符串“” 例如，假设默认编码为ascii，则命令 '%s is the word'%'ébác' 会让步 '<could not encode>b<could not encode>c is the word' “bc就是这个词” 在我的所有项目中，有没有办法将此作为默认行为？该函数采用一个可选参数来定义错误处理： str.encode([encoding[, errors]]) 从文档中：返回字符串的编

我想让python忽略它无法编码的字符，只需将它们替换为字符串

“

”

例如，假设默认编码为ascii，则命令

'%s is the word'%'ébác'

会让步

'<could not encode>b<could not encode>c is the word'

“bc就是这个词”

在我的所有项目中，有没有办法将此作为默认行为？

该函数采用一个可选参数来定义错误处理：

str.encode([encoding[, errors]])

从文档中：

返回字符串的编码版本。默认编码是当前默认的字符串编码。可以给错误设置不同的错误处理方案。错误的默认值为“strict”，这意味着编码错误会引发UnicodeError。其他可能的值包括“忽略”、“替换”、“xmlcharrefreplace”、“反斜杠替换”和通过编解码器注册的任何其他名称。register_error（），请参阅编解码器基类一节。有关可能的编码列表，请参阅标准编码一节

在您的情况下，该函数可能会引起兴趣

[关于坏字符的注释]

顺便说一句，在使用

register\u error

时，请注意，您可能会发现自己不仅用字符串替换单个坏字符，而且用字符串替换连续坏字符组，除非您注意。每运行一次坏字符，而不是每运行一个字符，都会调用一次错误处理程序

>>> help("".encode)
Help on built-in function encode:

encode(...)
S.encode([encoding[,errors]]) -> object

Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. **Other possible values are** 'ignore', **'replace'** and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that is able to handle UnicodeEncodeErrors.

例如：

>>> x
'\xc3\xa9b\xc3\xa1c is the word'
>>> x.decode("ascii")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> x.decode("ascii", "replace")
u'\ufffd\ufffdb\ufffd\ufffdc is the word'

>x
“\xc3\xa9b\xc3\xa1c是单词”
>>>x.解码（“ascii”）
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
UnicodeDecodeError:“ascii”编解码器无法解码位置0中的字节0xc3:序号不在范围内（128）
>>>x.解码（“ascii”，“替换”）
u'\ufffd\ufffdb\ufffd\ufffdc是单词'

将您自己的回调添加到codecs.register_error以替换为您选择的字符串。

如果默认编码是ascii，那么

“ébác”

中的字符串是什么编码？@Peter Hansen-您是对的：）这只是为了解释我想要什么。。。错误示例。有一些示例介绍了如何使用

编解码器。在@NadavB Thx！中注册\u error

。