使用Python将DOS文本文件转换为Unicode_Python_Text_Unicode_Codepages

使用Python将DOS文本文件转换为Unicode

python text unicode

使用Python将DOS文本文件转换为Unicode,python,text,unicode,codepages,Python,Text,Unicode,Codepages,我正在尝试编写一个Python应用程序，用于将旧的DOS代码页文本文件转换为其Unicode等效文件。现在，我已经在使用turbopascal之前创建了一个查找表，我相信使用Python字典也可以做到这一点。我的问题是：如何索引到字典中，以找到要转换的字符，并将等效的Unicode发送到Unicode输出文件我意识到这可能是一个类似问题的重复，但我在这里搜索的内容与我的问题完全不匹配。您可以使用字节对象的标准buildin解码方法： with open('dos.txt', 'r', enco

我正在尝试编写一个Python应用程序，用于将旧的DOS代码页文本文件转换为其Unicode等效文件。现在，我已经在使用turbopascal之前创建了一个查找表，我相信使用Python字典也可以做到这一点。我的问题是：如何索引到字典中，以找到要转换的字符，并将等效的Unicode发送到Unicode输出文件

我意识到这可能是一个类似问题的重复，但我在这里搜索的内容与我的问题完全不匹配。

您可以使用

字节对象的标准buildin解码方法：
with open('dos.txt', 'r', encoding='cp437') as infile, \
        open('unicode.txt', 'w', encoding='utf8') as outfile:
    for line in infile:
        outfile.write(line)

您可以使用字节对象的标准构建解码方法：
with open('dos.txt', 'r', encoding='cp437') as infile, \
        open('unicode.txt', 'w', encoding='utf8') as outfile:
    for line in infile:
        outfile.write(line)

Python有编解码器来进行转换：
#!python3

# Test file with bytes 0-255.
with open('dos.txt','wb') as f:
    f.write(bytes(range(256)))

# Read the file and decode using code page 437 (DOS OEM-US).
# Write the file as UTF-8 encoding ("Unicode" is not an encoding)
# UTF-8, UTF-16, UTF-32 are encodings that support all Unicode codepoints.

with open('dos.txt',encoding='cp437') as infile:
    with open('unicode.txt','w',encoding='utf8') as outfile:
        outfile.write(infile.read())

Python有编解码器来进行转换：
#!python3

# Test file with bytes 0-255.
with open('dos.txt','wb') as f:
    f.write(bytes(range(256)))

# Read the file and decode using code page 437 (DOS OEM-US).
# Write the file as UTF-8 encoding ("Unicode" is not an encoding)
# UTF-8, UTF-16, UTF-32 are encodings that support all Unicode codepoints.

with open('dos.txt',encoding='cp437') as infile:
    with open('unicode.txt','w',encoding='utf8') as outfile:
        outfile.write(infile.read())

事实上，你没有发布问题。你发布了一个问题。更多的是回答问题。为了得到回应，我建议尝试一些东西，如果它不起作用，发布你的代码并征求建议。你真的想自己进行查找吗？Python有很多内置编码：实际上，您没有发布问题。你发布了一个问题。更多的是回答问题。为了得到回应，我建议尝试一些东西，如果它不起作用，发布你的代码并征求建议。你真的想自己进行查找吗？Python有很多内置编码：在Python 2中，unicode.txt
的默认编码是ascii
，在Python 3中是locale.getpreferredencoding（）
（在美国Windows上，cp1252
）。因此，除非您使用Python3和默认为UTF-8的操作系统，否则这是行不通的。两者都不支持cp437字符的完整范围。在Python 2中，unicode.txt
的默认编码将是ascii
，而在Python 3中，它将是locale.getpreferredencoding（）
（在美国Windows上，cp1252
）。因此，除非您使用Python3和默认为UTF-8的操作系统，否则这是行不通的。两者都不支持cp437字符的完整范围。