Python 将文件从cp1251转换为utf8_Python_Encoding_Cp1251

Python 将文件从cp1251转换为utf8

python encoding

Python 将文件从cp1251转换为utf8,python,encoding,cp1251,Python,Encoding,Cp1251,我看到了类似的问题，但答案没有帮助。此代码： with codecs.open( sourceFileName, "r", sourceEncoding, ) as sourceFile: contents = sourceFile.read() with codecs.open( sourceFileName, "w", "utf-8") as targetFile: if contents: targetFile.write(contents) wit

我看到了类似的问题，但答案没有帮助。此代码：

with codecs.open( sourceFileName, "r",  sourceEncoding, ) as sourceFile:
    contents = sourceFile.read()

with codecs.open( sourceFileName, "w", "utf-8") as targetFile:
    if contents:
        targetFile.write(contents)

with open(sourceFileName, "rb") as sourceFileBin:
    contents = sourceFileBin.read().decode(sourceEncoding)

with open(sourceFileName, "wb") as targetFile:
    targetFile.write( contents.encode("unt-8"))

返回错误“UnicodeDecodeError:'charmap'编解码器无法解码位置1中的字节0x98：字符映射为未定义”

此代码：

with codecs.open( sourceFileName, "r",  sourceEncoding, ) as sourceFile:
    contents = sourceFile.read()

with codecs.open( sourceFileName, "w", "utf-8") as targetFile:
    if contents:
        targetFile.write(contents)

with open(sourceFileName, "rb") as sourceFileBin:
    contents = sourceFileBin.read().decode(sourceEncoding)

with open(sourceFileName, "wb") as targetFile:
    targetFile.write( contents.encode("unt-8"))

产生相同的错误。麻烦的符号是西里尔字母“П”（据我所知，它由“0xc8”而不是“0x98”表示）。我正在windows上使用python 2.7

UPD：原来，原始文件编码可能不是cp1251，这些错误可能是文本编辑器中的错误造成的。但是，我的所有文本编辑器都可以正确读取此文件。

然后我正在寻找一些解决方法，因为没有这个特殊字母的文件被正确转换。

我发现由于某种错误（或者只是我的愚蠢），我试图转换已经转换的文件

很抱歉浪费了你的时间

我就知道。该脚本可能在Python3中工作，因为它直接处理unicode对象。但在2.7版中，有两种类型的字符串对象：

str

和

unicode

，不幸的是，

str

是默认值：）

chr（0x98）

是

≤

，您确定这是cp1251错误吗？识别已转换的文件非常有用：

u'crmk_'。编码（'utf-8'）。解码（'cp1251'）

（它会复制您的错误）