Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/vba/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用ElementTree编写包含utf-8数据的xml utf-8文件_Python_Elementtree - Fatal编程技术网

Python 使用ElementTree编写包含utf-8数据的xml utf-8文件

Python 使用ElementTree编写包含utf-8数据的xml utf-8文件,python,elementtree,Python,Elementtree,我尝试使用ElementTree编写一个包含utf-8编码数据的xml文件,如下所示: #!/usr/bin/python # -*- coding: utf-8 -*- import xml.etree.ElementT

我尝试使用ElementTree编写一个包含utf-8编码数据的xml文件,如下所示:

#!/usr/bin/python                                                                       
# -*- coding: utf-8 -*-                                                                   

import xml.etree.ElementTree as ET
import codecs

testtag = ET.Element('unicodetag')
testtag.text = u'Töreboda' #The o is really ö (o with two dots over). No idea why SO dont display this
expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
expfile.close()
这就导致了错误

Traceback (most recent call last):
  File "unicodetest.py", line 10, in <module>
    ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 815, in write
    serialize(write, self._root, encoding, qnames, namespaces)    
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "/usr/lib/python2.7/codecs.py", line 691, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
回溯(最近一次呼叫最后一次):
文件“unicodetest.py”,第10行,在
ET.ElementTree(testtag).write(expfile,encoding=“UTF-8”,xml\u声明=True)
写入文件“/usr/lib/python2.7/xml/etree/ElementTree.py”,第815行
序列化(写入、自根、编码、qnames、命名空间)
文件“/usr/lib/python2.7/xml/etree/ElementTree.py”,第932行,在xml中序列化
写入(\u转义\u cdata(文本、编码))
文件“/usr/lib/python2.7/codecs.py”,第691行,写入
返回self.writer.write(数据)
文件“/usr/lib/python2.7/codecs.py”,第351行,写入
数据,消耗=self.encode(对象,self.errors)
UnicodeDecodeError:“ascii”编解码器无法解码位置1中的字节0xc3:序号不在范围内(128)

使用“us ascii”编码可以很好地工作,但不保留数据中的unicode字符。发生了什么事?

编解码器。open
要求将Unicode字符串写入文件对象,它将处理UTF-8编码。ElementTree的
write
将Unicode字符串编码为UTF-8字节字符串,然后将其发送到文件对象。由于文件对象需要Unicode字符串,因此它使用默认的
ascii
编解码器强制将字节字符串恢复为Unicode,并导致
UnicodeDecodeError

只要这样做:

#expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write('testunicode.xml',encoding="UTF-8",xml_declaration=True)
#expfile.close()

+1.为了澄清这一点:问题是您试图对unicode->utf-8进行两次编码:ElementTree进行一次编码,然后启用编解码器的流尝试再次进行编码。但是第二次传递会被混淆,因为它的输入已经被编码(它需要一个unicode字符串,但会得到一个utf-8编码的字节字符串)。。。我能说我爱你吗?3小时内给出完美答案!Marks精化也解释了很多。我一直在处理utf-8数据,在ElementTree中收到了类似的错误。_serialize_text()或_serialize_xml()尝试写入xml文件时。在将字符串添加到ET.Element对象之前,我可以使用myString.decode('utf-8')将字符串转换为unicode来解决这个问题。似乎ET.ElementTree.write()对其他字符串编码不满意。