Python 为什么LANG会更改str.encode（）的输出_Python_Python 3.x

Python 为什么LANG会更改str.encode（）的输出

python python-3.x

Python 为什么LANG会更改str.encode（）的输出,python,python-3.x,Python,Python 3.x,由于“α”不在ISO-8859-1编码中，因此该命令返回预期的b'？' LANG=en_US.UTF-8 python -c "print('α'.encode('ISO-8859-1', 'replace'))" 这个命令返回b'\xce\xb1'，我不理解 LANG=en_US.ISO-8859-1 python -c "print('α'.encode('ISO-8859-1', 'replace'))" 这是什么原因造成的？我想做的是删除不在编码中的字符（这里是ISO-8859-1）

由于“α”不在ISO-8859-1编码中，因此该命令返回预期的

b'？'

LANG=en_US.UTF-8 python -c "print('α'.encode('ISO-8859-1', 'replace'))"

这个命令返回

b'\xce\xb1'

，我不理解

LANG=en_US.ISO-8859-1 python -c "print('α'.encode('ISO-8859-1', 'replace'))"

这是什么原因造成的？我想做的是删除不在编码中的字符（这里是ISO-8859-1），用

？

替换它们，就像我认为这段代码应该做的那样。

它不会改变

str.encode

的输出；它正在更改

sys.stdin

的编码

$ LANG=en_US.UTF-8 python -c "print(__import__('sys').stdin.encoding)"
UTF-8
$ LANG=en_US.ISO-8859-1 python -c "print(__import__('sys').stdin.encoding)"
ISO-8859-1

因此，Python将来自终端的UTF-8

b'\xce\xb1'

解释为文本字节：

$ LANG=en_US.ISO-8859-1 python3 -c "print(len('α'))"
2
$ LANG=en_US.UTF-8 python3 -c "print(len('α'))"                 
1

是的，但我不明白为什么会有什么不同。命令

print（len（'alpha'.encode（'ISO-8859-1'，'replace'））

使用UTF-8返回

，使用ISO-8859-1返回

，因此这不是导致差异的print语句。