Python .csv文件的Unicode解码错误
我有一个非常基本的Python问题 我正在尝试编写一个脚本,以消除一些.csv文件中的一堆空行,我编写的脚本可以处理大约90%的文件,但有一些脚本会向我抛出以下错误:Python .csv文件的Unicode解码错误,python,csv,python-3.x,unicode,codec,Python,Csv,Python 3.x,Unicode,Codec,我有一个非常基本的Python问题 我正在尝试编写一个脚本,以消除一些.csv文件中的一堆空行,我编写的脚本可以处理大约90%的文件,但有一些脚本会向我抛出以下错误: Traceback (most recent call last): File "/Users/stephensmith/Documents/Permits/deleterows.py", line 17, in <module> deleteRow(file, "output/" + file) Fi
Traceback (most recent call last):
File "/Users/stephensmith/Documents/Permits/deleterows.py", line 17, in <module>
deleteRow(file, "output/" + file)
File "/Users/stephensmith/Documents/Permits/deleterows.py", line 8, in deleteRow
for row in csv.reader(input):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/utf_8_sig.py", line 69, in _buffer_decode
return codecs.utf_8_decode(input, errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 6540: invalid start byte
我尝试将encoding='utf-8'、='ascii'和='latin1'添加到我的两个open()语句中,但没有成功:-(你知道我做错了什么吗?.csv文件是用Excel for Mac 2011创建的,如果有帮助的话。也许你可以尝试在崩溃的csv文件中循环,例如:
with open(file) as f:
for line in f:
print repr(line)
看看是否有可疑人物出现
如果您能够通过这种方式识别可疑字符,例如弹出\0Xý1,则可以通过重写和替换该字符来清理文件:
with open(file) as f:
with open(file.rstrip(".csv") + "_fixed.csv") as g:
for line in f:
g.write(line.replace('\0Xý1', ''))
然后用清理过的文件再试一次。这是一个编码问题。输入的csv文件并不像您的Python平台所期望的那样是utf-8编码的。问题是,在不知道其编码的情况下,也没有一个违规行的示例,我真的无法猜到编码
encoding='utf8'
和encoding='ascii'
都中断是正常的,因为违规字符是0xa2,不在ascii范围内(
with open(file) as f:
with open(file.rstrip(".csv") + "_fixed.csv") as g:
for line in f:
g.write(line.replace('\0Xý1', ''))
class special_opener:
def __init__(self, filename, encoding):
self.fd = open(filename, 'rb')
self.encoding = encoding
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
return False
def __next__(self):
line = next(self.fd)
try:
return line.decode(self.encoding).strip('\r\n') + '\n'
except Exception as e:
print("Offending line : ", line, file = sys.stderr)
raise e
def __iter__(self):
return self
def deleteRow(in_fnam, out_fnam):
input = special_opener(in_fnam, 'latin1')
output = open(out_fnam, 'w')
writer = csv.writer(output)
for row in csv.reader(input):
if any(row):
writer.writerow(row)
input.close()
output.close()
Offending line : b'a,\xe9,\xe8,d\r\n'
Traceback (most recent call last):
...