Python UnicodeEncodeError:&x27；charmap'；编解码器可以'；t编码-字符映射到<；未定义>；，打印功能_Python_Encoding_Decode_Encode

Python UnicodeEncodeError:&x27；charmap'；编解码器可以'；t编码-字符映射到<；未定义>；，打印功能

python encoding

Python UnicodeEncodeError:&x27；charmap'；编解码器可以'；t编码-字符映射到<；未定义>；，打印功能,python,encoding,decode,encode,Python,Encoding,Decode,Encode,我正在编写一个Python（Python3.3）程序，使用POST方法将一些数据发送到网页。主要是为了调试过程，我获取页面结果并使用print（）函数将其显示在屏幕上代码如下所示： conn.request("POST", resource, params, headers) response = conn.getresponse() print(response.status, response.reason) data = response.read() print(data.decode

我正在编写一个Python（Python3.3）程序，使用POST方法将一些数据发送到网页。主要是为了调试过程，我获取页面结果并使用

print（）

函数将其显示在屏幕上

代码如下所示：

conn.request("POST", resource, params, headers)
response = conn.getresponse()
print(response.status, response.reason)
data = response.read()
print(data.decode('utf-8'));

HTTPResponse

.read（）

方法返回一个编码页面的

bytes

元素（这是一个格式良好的UTF-8文档），在我停止使用Windows的空闲GUI并改用Windows控制台之前，它似乎还可以。返回的页面有一个U+2014字符（em破折号），打印功能可以在Windows GUI（我假定代码页1252）中很好地翻译该字符，但在Windows控制台（代码页850）中无法翻译。给定

严格的

默认行为，我得到以下错误：

UnicodeEncodeError: 'charmap' codec can't encode character '\u2014' in position 10248: character maps to <undefined>

现在，它用

？

替换有问题的字符“-”。这不是理想的情况（连字符应该是更好的替代品），但对我来说已经足够好了

有几件事我不喜欢我的解决方案

代码在所有的解码、编码和解码过程中都是丑陋的

这就解决了这个问题。如果我使用其他编码（latin-1、cp437、返回cp1252等）为系统移植程序，它应该识别目标编码。事实并非如此。（例如，当再次使用空闲GUI时，emdash也会丢失，这在以前是没有发生过的）

如果emdash翻译成连字符而不是审问砰的一声，那就更好了

问题不是emdash（我可以想出几种方法来解决这个问题），而是我需要编写健壮的代码。我正在向页面提供来自数据库的数据，这些数据可以返回。我可以预见许多其他冲突的情况：一个“Á”U+00c1（在我的数据库中是可能的）可以翻译成CP-850（西欧语言的DOS/Windows控制台编码），但不能翻译成CP-437（美国英语编码，这在许多Windows安装中是默认的）

因此，问题是：

有没有更好的解决方案可以使我的代码不受输出接口编码的影响？

我看到了三种解决方案：

更改输出编码，使其始终输出UTF-8。请参阅，例如，但我无法使这些示例起作用

下面的示例代码使输出知道您的目标字符集

# -*- coding: utf-8 -*-
import sys

print sys.stdout.encoding
print u"Stöcker".encode(sys.stdout.encoding, errors='replace')
print u"Стоескер".encode(sys.stdout.encoding, errors='replace')

这个例子正确地用问号替换了我名字中任何不可打印的字符

如果您创建了一个自定义打印功能，例如名为

myprint

，使用该机制对输出进行正确编码，您只需在必要时将打印替换为

myprint

，而不会使整个代码看起来很难看

在软件开始时全局重置输出编码：

该页面有一个很好的摘要，说明如何更改输出编码。特别是“StreamWriter包装Stdout”一节很有趣。从本质上讲，它表示要像这样更改I/O编码函数：

conn.request("POST", resource, params, headers)
response = conn.getresponse()
print(response.status, response.reason)
data = response.read()
print(data.decode('utf-8'));

在Python 2中：

if sys.stdout.encoding != 'cp850':
  sys.stdout = codecs.getwriter('cp850')(sys.stdout, 'strict')
if sys.stderr.encoding != 'cp850':
  sys.stderr = codecs.getwriter('cp850')(sys.stderr, 'strict')

在Python 3中：

if sys.stdout.encoding != 'cp850':
  sys.stdout = codecs.getwriter('cp850')(sys.stdout.buffer, 'strict')
if sys.stderr.encoding != 'cp850':
  sys.stderr = codecs.getwriter('cp850')(sys.stderr.buffer, 'strict')

如果在CGI输出HTML中使用，您可以将“strict”替换为“xmlcharrefreplace”，以获得不可打印字符的HTML编码标记

请随意修改方法，设置不同的编码。。。。请注意，输出非指定数据仍然无效。因此，任何数据、输入、文本都必须能够正确转换为unicode：

# -*- coding: utf-8 -*-
import sys
import codecs
sys.stdout = codecs.getwriter("iso-8859-1")(sys.stdout, 'xmlcharrefreplace')
print u"Stöcker"                # works
print "Stöcker".decode("utf-8") # works
print "Stöcker"                 # fails

我认为有三种解决办法：