在python中从xls读取unicode
我试图用Python读入一个.xls文件。该文件包含多个非ascii字符(即äöü)。我尝试了openpyxls和xlrd(我对xlrd寄予厚望,因为它应该以unicode读取所有内容),但两种方法都不起作用 我在尝试打印xls中的信息时,发现了许多关于编码/解码的答案,但我似乎都走不了那么远。仅在尝试读取文件后,此纸条就会出错:在python中从xls读取unicode,python,xls,xlrd,Python,Xls,Xlrd,我试图用Python读入一个.xls文件。该文件包含多个非ascii字符(即äöü)。我尝试了openpyxls和xlrd(我对xlrd寄予厚望,因为它应该以unicode读取所有内容),但两种方法都不起作用 我在尝试打印xls中的信息时,发现了许多关于编码/解码的答案,但我似乎都走不了那么远。仅在尝试读取文件后,此纸条就会出错: import xlrd workbook = xlrd.open_workbook('export_data.xls') 导致: Traceback (most r
import xlrd
workbook = xlrd.open_workbook('export_data.xls')
导致:
Traceback (most recent call last):
File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
workbook = xlrd.open_workbook('export_data.xls')
File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
self.get_sheet(sheetno)
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 55: ordinal not in range(128)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf-8")
File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
self.get_sheet(sheetno)
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 55: invalid start byte
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
导致:
Traceback (most recent call last):
File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
workbook = xlrd.open_workbook('export_data.xls')
File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
self.get_sheet(sheetno)
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 55: ordinal not in range(128)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf-8")
File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
self.get_sheet(sheetno)
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 55: invalid start byte
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
我在WindowsServer2008机器上的Python2.7上运行这个 根据我对OOo文档的阅读,xls使用了unicode的utf_16_le风格,而不是utf8(也就是说,它每个字符只使用两个字节存储在little endian中),因此请尝试:
(请参阅第17页的)有点晚了,但我希望您尝试过编码。谢谢大家的反馈 我最终用编码覆盖函数修复了它。我无法找到cp代码对应于德语字符的Microsoft文档,所以我尝试了所有这些代码。最终我找到了cp1251,它成功了
workbook = xlrd.open_workbook(path, encoding_override="cp1251")
workbook = xlrd.open_workbook(path, encoding_override="cp1251")