我可以用Python中的5位unicode替换字符串吗_Python_Unicode

我可以用Python中的5位unicode替换字符串吗

python unicode

我可以用Python中的5位unicode替换字符串吗,python,unicode,Python,Unicode,早些时候，我问过这个问题：但今天我发现大写U代码点在打印时可以工作，但当我在文件中尝试时，结果却失败了。为什么? import re f = codecs.open('test.txt', 'r', encoding="utf-8") g = codecs.open('test_output.txt', 'w', encoding="utf-8") fin = f.read() output = re.sub('m', '\U000243D0', fin) g.write(output)

早些时候，我问过这个问题：

但今天我发现大写U代码点在打印时可以工作，但当我在文件中尝试时，结果却失败了。为什么?

import re

f = codecs.open('test.txt', 'r', encoding="utf-8")
g = codecs.open('test_output.txt', 'w', encoding="utf-8")
fin = f.read()
output = re.sub('m', '\U000243D0', fin)
g.write(output)

这对我来说很好：

import re

with open('/tmp/test.txt', 'w', encoding='utf8') as testfile:
    testfile.write("I don't go to school on mondays")

with open('/tmp/test.txt', 'r', encoding='utf8') as testfile, open('/tmp/test_output.txt', 'w', encoding='utf8') as testout:
    output = re.sub('m', '\U000243D0', testfile.read())
    testout.write(output)

with open('/tmp/test_output.txt', 'r', encoding='utf8') as testfile:
    print(repr(testfile.read()))

输出

“我不是在不及格的情况下上学的。确切地说，你的代码没有什么问题，你得到的输出与你期望的输出相比是多少？@dan04:thecodecs
用法指向Python 2；在Python 3中，你通常只会使用open（）
。我使用的是Python 3.3。奇怪的是，m被替换为ए. 它的代码点是\u090F。@user1610952:您用什么来测试数据？\u090F
被编码为UTF-8，作为\xE0\xA4\x8F
（三个字节以\xE0
开头），并将\U000243D0
编码为\xF0\xA4\x8F\x90
；如果从第一个字节中删除1位并忽略\x90
字节，则会出现重叠。Python不会这样做（我测试了它）请注意，在2.6和2.7中，您可以使用“未来导入unicode文本”中的语句，使字符串在默认情况下具有类型unicode
。@dan04：当然，对于编写需要在python 2和python 3上运行的代码的人来说，这很好，但对于大多数开发人员来说pers只针对一个版本，通常没有那么大帮助。：-）谢谢，但我使用的是Python 3.3。我仍然不明白它为什么不起作用。@user1610952:那么请显示（摘录）写入文件的输出，以及您期望的内容。您可以使用Python读回它，并向我们显示一个repr（）
您想要更正的字节数。test.txt是：我星期一不上学。它变成了（test_out.txt）：我星期一不上学ए我所期望的是：我不是星期天上学