Python编解码器编码不起作用_Python_Python 2.7

Python编解码器编码不起作用

python python-2.7

Python编解码器编码不起作用,python,python-2.7,Python,Python 2.7,我有这个密码 import collections import csv import sys import codecs from xml.dom.minidom import parse import xml.dom.minidom String = collections.namedtuple("String", ["tag", "text"]) def read_translations(filename): #Reads a csv file with rows made up o

我有这个密码

import collections
import csv
import sys
import codecs
from xml.dom.minidom import parse
import xml.dom.minidom

String = collections.namedtuple("String", ["tag", "text"])

def read_translations(filename): #Reads a csv file with rows made up of 2 columns: the string tag, and the translated tag
    with codecs.open(filename, "r", encoding='utf-8') as csvfile:
        csv_reader = csv.reader(csvfile, delimiter=",")
        result = [String(tag=row[0], text=row[1]) for row in csv_reader]
    return result

我正在读取的CSV文件包含巴西葡萄牙语字符。当我尝试运行此操作时，我得到一个错误：

'utf8' codec can't decode byte 0x88 in position 21: invalid start byte

我正在使用Python 2.7。正如你所看到的，我用编解码器编码，但它不起作用

有什么想法吗？

这一行的想法：

with codecs.open(filename, "r", encoding='utf-8') as csvfile:

意思是“此文件保存为utf-8。读取时请进行适当的转换。”

如果该文件实际保存为utf-8，则可以正常工作。如果使用了其他编码，那么就不好了

然后呢？

确定使用了哪种编码。假设无法从创建文件的软件中获取信息-guess

正常打开文件并打印每行：

with open(filename, 'rt') as f:
    for line in f:
        print repr(line)

然后查找非ASCII字符，例如ñ-此字母将打印为某些代码，例如：

'espa\xc3\xb1ol'

上面，ñ表示为

\xc3\xb1

，因为这是它的utf-8序列

现在，您可以检查各种编码将给出什么，并查看哪个是正确的：

>>> ntilde = u'\N{LATIN SMALL LETTER N WITH TILDE}'
>>> 
>>> print repr(ntilde.encode('utf-8'))
'\xc3\xb1'
>>> print repr(ntilde.encode('windows-1252'))
'\xf1'
>>> print repr(ntilde.encode('iso-8859-1'))
'\xf1'
>>> print repr(ntilde.encode('macroman'))
'\x96'

或打印所有这些文件：

for c in encodings.aliases.aliases:
    try:
        encoded = ntilde.encode(c)
        print c, repr(encoded)
    except:
        pass

然后，当您猜出它是哪种编码时，请使用该编码，例如：

with codecs.open(filename, "r", encoding='iso-8859-1') as csvfile:

可能您的文件未保存为UTF-8？请尝试将

encoding='UTF-8'

更改为

encoding='cp1252'

。不看数据我们就说不出什么。那些家伙说了什么。Windows不使用UTF-8，除非您强制使用；您打开的任何随机文件都很可能使用当前Windows代码页进行编码。你可以使用

encoding='mbcs'

来获得它，而不知道它是什么。忘了补充一点，我在这上面使用的是Mac。我用升华打开了文件，并用UTF-8编码保存。我尝试了cp1252，但它返回了以下错误：UnicodeDecodeError:“charmap”编解码器无法对位置31中的字节0x8d进行解码：字符映射到您需要找出用于生成文件的编码。如果在编辑器中打开该文件，是否可以看到正确的字符？