Python特殊字符编码_Python_Xml_Csv

Python特殊字符编码

python xml csv

Python特殊字符编码,python,xml,csv,Python,Xml,Csv,我有一个python脚本，它读取一个CSV文件并写入一个XML文件。我一直在努力寻找如何阅读特殊字符，如：ç，á，é，í等。没有特殊字符，脚本运行得非常好。这是脚本头： # coding=utf-8 ''' @modified by: Julierme Pinheiro ''' import os import sys import unittest from unittest import skip import csv import uuid import xml import xml.d

我有一个python脚本，它读取一个CSV文件并写入一个XML文件。我一直在努力寻找如何阅读特殊字符，如：ç，á，é，í等。没有特殊字符，脚本运行得非常好。这是脚本头：

# coding=utf-8

'''
@modified by: Julierme Pinheiro
'''
import os
import sys
import unittest
from unittest import skip
import csv
import uuid
import xml
import xml.dom.minidom as minidom
import owslib
from owslib.iso import *
import pyproj
from decimal import *
import logging

我从csv文件中检索信息的方式如下所示：

# add the title
                title = data[1]
                titleElement = identificationInfo[0].getElementsByTagName('gmd:title')[0]
                titleNode = record.createTextNode(title)
                titleElement.childNodes[1].appendChild(titleNode)
                print "Title:" + title

 # write out the gemini record
                filename = '../output/%s.xml' % fileId
                with open(filename,'w') as test_xml:
                    test_xml.write(record.toprettyxml(newl="", encoding="utf-8"))
            except:
                e = sys.exc_info()[1]
                logging.debug("Import failed for entry %s" % data[0])
                logging.debug("Specific error: %s" % e)

    @skip('')
    def testOWSMetadataImport(self):
        raw_data = []
        with open('../input/metadata_cartapapel.csv') as csvfile:
            reader = csv.reader(csvfile, dialect='excel')
            for columns in reader:
                raw_data.append(columns)   

        md = MD_Metadata(etree.parse('gemini-template.xml'))
        md.identification.topiccategory = ['farming','environment']
        print md.identification.topiccategory
        outfile = open('mdtest.xml','w')
        # crap, can't update the model and write back out - this is badly needed!!
        outfile.write(md.xml) 


if __name__ == "__main__":
    unittest.main()

注意：如果csv文件的第二列数据[1]包含“Navegaço”中的特殊字符，则脚本失败（它不会在xml文件中写入任何内容
基于空白模板XML创建新XML文件的方式如下所示：

# add the title title = data[1] titleElement = identificationInfo[0].getElementsByTagName('gmd:title')[0] titleNode = record.createTextNode(title) titleElement.childNodes[1].appendChild(titleNode) print "Title:" + title

# write out the gemini record filename = '../output/%s.xml' % fileId with open(filename,'w') as test_xml: test_xml.write(record.toprettyxml(newl="", encoding="utf-8")) except: e = sys.exc_info()[1] logging.debug("Import failed for entry %s" % data[0]) logging.debug("Specific error: %s" % e) @skip('') def testOWSMetadataImport(self): raw_data = [] with open('../input/metadata_cartapapel.csv') as csvfile: reader = csv.reader(csvfile, dialect='excel') for columns in reader: raw_data.append(columns) md = MD_Metadata(etree.parse('gemini-template.xml')) md.identification.topiccategory = ['farming','environment'] print md.identification.topiccategory outfile = open('mdtest.xml','w') # crap, can't update the model and write back out - this is badly needed!! outfile.write(md.xml) if __name__ == "__main__": unittest.main()
有人能帮忙解决这个问题吗

提前感谢您抽出时间。
太好了。如果您使用的是python 2.7，则csv无法读取unicode。在Python3.x中，可以在打开文件时传递utf-8选项
在python中，您可以将数据[1]解码为utf-8，如下所示

title = data[1].decode('utf-8')
某些英文版的windows旧版windows组件可能需要“cp1252”。如果上述解码失败，请尝试此操作

title = data[1].decode('cp1252')

Python2 CSV文档中有一些演示如何处理Unicode数据的文档。顺便说一句，你应该在二进制模式下打开CSV文件，但我猜你是在Linux上，所以你不需要为此烦恼，除非你想在Windows上运行你的脚本。亲爱的@VinayakKolagi，你直奔目标了。我正在运行Python2.7.1，我刚刚在：（）Hi@VinayakKolagi中读到了您的解决方案，我在脚本头中插入了“导入编解码器”，我正在运行Python2.7.1，我无法解码变量。我得到了.decode（'utf-8'）的以下错误：回溯（最近一次调用最后一次）：文件“（stdin）”，第1行，（在模块中）文件“C:\Python27\lib\encodings\utf-8.py”，第16行，在解码返回编解码器中。utf_8_decode（输入，错误，真）UnicodeDecodeError:'utf-8'编解码器无法解码位置1中的字节0x82：无效的开始字节。问题可能是由什么引起的？看起来编码不在utf-8中。尝试我在答案中添加的cp1252解码。Hi@VinayakKolagi。就这样。目标的权利：（'cp1252'）。无论谁在运行python 2.7.1时遇到此问题，都可以尝试此解决方案。我的问题已经解决了。如果答案是有用的，请考虑接受答案。