Python 不支持带有编码声明的XML Unicode字符串

Python 不支持带有编码声明的XML Unicode字符串,python,xml,django,Python,Xml,Django,正在尝试执行以下操作 from lxml import etree from lxml.etree import fromstring if request.POST: parser = etree.XMLParser(ns_clean=True, recover=True) h = fromstring(request.POST['xml'], parser=parser) return HttpResponse(h.cssselect('itagg_delivery

正在尝试执行以下操作

from lxml import etree
from lxml.etree import fromstring

if request.POST:
    parser = etree.XMLParser(ns_clean=True, recover=True)
    h = fromstring(request.POST['xml'], parser=parser)
    return HttpResponse(h.cssselect('itagg_delivery_receipt status').text_content())
但它给出了这样一个错误:

[Fri Apr 05 10:27:54 2013] [error] Internal Server Error: /sms/status_postback/
[Fri Apr 05 10:27:54 2013] [error] Traceback (most recent call last):
[Fri Apr 05 10:27:54 2013] [error]   File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
[Fri Apr 05 10:27:54 2013] [error]     response = callback(request, *callback_args, **callback_kwargs)
[Fri Apr 05 10:27:54 2013] [error]   File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 77, in wrapped_view
[Fri Apr 05 10:27:54 2013] [error]     return view_func(*args, **kwargs)
[Fri Apr 05 10:27:54 2013] [error]   File "/srv/project/livewireSMS/sms/views.py", line 42, in update_delivery_status
[Fri Apr 05 10:27:54 2013] [error]     h = fromstring(request.POST['xml'], parser=parser)
[Fri Apr 05 10:27:54 2013] [error]   File "lxml.etree.pyx", line 2754, in lxml.etree.fromstring (src/lxml/lxml.etree.c:54631)
[Fri Apr 05 10:27:54 2013] [error]   File "parser.pxi", line 1569, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)
[Fri Apr 05 10:27:54 2013] [error] ValueError: Unicode strings with encoding declaration are not supported.
这是XML

 <?xml version="1.1" encoding="ISO-8859-1"?>
<itagg_delivery_receipt>
<version>1.0</version>
<msisdn>447889000000</msisdn>
<submission_ref>
845tgrgsehg394g3hdfhhh56445y7ts6</
submission_ref>
<status>Delivered</status>
<reason>4</reason>
<timestamp>20050709120945</timestamp>
<retry>0</retry>
</itagg_delivery_receipt> 

1
447889000000
845TGRGSEHG394G3HDFHH56445Y7TS6
交付
4.
20050709120945
0

我无法控制来自SMS公司的xml文档。

您必须对其进行编码,然后在解析器中强制执行相同的编码:

from lxml import etree
from lxml.etree import fromstring

if request.POST:
    xml = request.POST['xml'].encode('utf-8')
    parser = etree.XMLParser(ns_clean=True, recover=True, encoding='utf-8')
    h = fromstring(xml, parser=parser)

    return HttpResponse(h.cssselect('delivery_reciept status').text_content())

以下解决方案适用于我:

from lxml import etree

xml = u'<?xml version="1.0" encoding="utf-8" ?><foo><bar/></foo>'
xml = bytes(bytearray(xml, encoding='utf-8'))  # ADDENDUM OF THIS LINE (when unicode means utf-8, e.g. on Linux)
etree.XML(xml)

# <Element html at 0x5b44c90>
从lxml导入etree
xml=u''
xml=字节(bytearray(xml,encoding='utf-8'))#这一行的附录(当unicode表示utf-8时,例如在Linux上)
XML(XML)
# 

比上面的答案更简单:

从lxml导入etree
#Do请求数据,响应=r#
data=etree.fromstring(字节(r.text,encoding='utf-8'))

很有趣。即使在阅读了建议解决方法的文章之后,我仍然会想念为什么解码失败…:扫描在代码中定义XML数据的独立示例?此解决方案无助于解决编码问题。从lxml常见问题解答:可能重复的您可能会为解决方案添加一些解释,以便以后的用户可以更轻松地理解您的代码。r是什么类型?变量r将来自请求库。因此您将拥有r=requests.get(URL)。