Python-读取xls->；操纵->；写入CSV_Python_Csv_Xls

Python-读取xls->；操纵->；写入CSV

python csv

Python-读取xls->；操纵->；写入CSV,python,csv,xls,Python,Csv,Xls,我正在尝试归档以下内容：输入：xls文件输出：csv文件我想阅读xls并进行一些操作（重写标题（原始：customernumer，csv需要Customer\u Number\uu c），删除一些列，等等现在，我已经在阅读xls并尝试以csv的形式编写（没有任何操作），但由于编码的原因，我一直在挣扎。原始文件包含一些“特殊”字符，如“/”、“\”，以及最重要的“ä、ü、ö、ß” 我得到以下错误： UnicodeEncodeError: 'ascii' codec can't encod

我正在尝试归档以下内容：

输入：xls文件输出：csv文件

我想阅读xls并进行一些操作（重写标题（原始：customernumer，csv需要Customer\u Number\uu c），删除一些列，等等

现在，我已经在阅读xls并尝试以csv的形式编写（没有任何操作），但由于编码的原因，我一直在挣扎。原始文件包含一些“特殊”字符，如“/”、“\”，以及最重要的“ä、ü、ö、ß”

我得到以下错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 8: ordinal not in range(128)

我不知道文件中可以包含哪些特殊字符，这会不时发生变化

以下是我当前的沙盒代码：

    # -*- coding: utf-8 -*-
__author__ = 'adieball'


import xlrd
import csv
from os import sys
import argparse

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("inname", type=str,
                        help="Names of the Input File in single quotes")

    parser.add_argument("--outname", type=str,
                        help="Optional enter the name of the output (csv) file. if nothing is given, "
                             "we use the name of the input file and add .csv to it")

    args = parser.parse_args()

    if args.outname is None:
        outname = args.inname + ".csv"
    else:
        outname = args.outname



    wb = xlrd.open_workbook(args.inname)
    xl_sheet = wb.sheet_by_index(0)
    print args.inname
    print ('Retrieved worksheet: %s' % xl_sheet.name)
    print outname



    output = open(outname, 'wb')
    wr = csv.writer(output, quoting=csv.QUOTE_ALL)

    for rownum in xrange(wb.sheet_by_index(0).nrows):
        wr.writerow(wb.sheet_by_index(0).row_values(rownum))

    output.close()

我可以在这里做些什么来确保这些特殊字符以与原始xls中相同的方式写入csv

谢谢

andre

您可以将脚本转换为Python 3，然后在打开输出文件时将写入模式设置为“w”改为编写Unicode。不尝试推广，但Python 3使这类事情更容易。如果您想继续使用Python 2，请查看此指南：

如果您想编写utf-8编码文件，必须使用

编解码器。打开

。尝试以下小示例：

o1 = open('/tmp/o1.txt', 'wb')
try:
    o1.write(u'\u20ac')
except Exception, exc:
    print exc
o1.close()

import codecs
o2 = codecs.open('/tmp/o2.txt', 'w', 'utf-8')
o2.write(u'\u20ac')
o2.close()

为什么不使用csv文档中的示例中的UnicodeWriter类呢？我认为它应该可以解决您的问题

如果没有，我会建议你用不同的方法来解决你的问题，如果你有Excel-使用win32com、分派Excel和使用Excel对象模型。你可以使用内置Excel函数来重命名、删除列等，然后将其保存为csv。例如

简单的

从操作系统导入系统重新加载（系统）系统设置默认编码（“utf-8”）

成功了

Andre

我建议xlrd，这与操作系统无关，尤其是在实现服务器的情况下。

import win32com.client
excelInstance = win32com.client.gencache.EnsureDispatch('Excel.Application')
workbook = excelInstance.Workbooks.Open(filepath)
worksheet = workbook.Worksheets('WorksheetName')
#### do what you like
worksheet.UsedRange.Find('customernumer').Value2 = 'Customer_Number__c'
####
workbook.SaveAs('Filename.csv', 6) #6 means csv in XlFileFormat enumeration