Python xml.etree.ElementTree.ParseError:格式不正确(无效令牌)

Python xml.etree.ElementTree.ParseError:格式不正确(无效令牌),python,python-3.x,xml-parsing,Python,Python 3.x,Xml Parsing,使用Python3 我们得到的错误: File "C:/scratch.py", line 27, in run tree = ET.fromstring(responses[0].decode(), ET.XMLParser(encoding='utf-8')) File "C:\Programs\Python\Python36-32\lib\xml\etree\ElementTree.py", line 1314, in XML parser.feed(text) xml

使用Python3

我们得到的错误:

File "C:/scratch.py", line 27, in run
    tree = ET.fromstring(responses[0].decode(), ET.XMLParser(encoding='utf-8'))
  File "C:\Programs\Python\Python36-32\lib\xml\etree\ElementTree.py", line 1314, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 163, column 1106
tree = ET.fromstring(responses[0].decode(), ET.XMLParser(encoding='utf-8'))
    for i in tree.iter('item'):
        try:
            title = i.find('title').text
        except Exception:
            pass
我们的代码:

File "C:/scratch.py", line 27, in run
    tree = ET.fromstring(responses[0].decode(), ET.XMLParser(encoding='utf-8'))
  File "C:\Programs\Python\Python36-32\lib\xml\etree\ElementTree.py", line 1314, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 163, column 1106
tree = ET.fromstring(responses[0].decode(), ET.XMLParser(encoding='utf-8'))
    for i in tree.iter('item'):
        try:
            title = i.find('title').text
        except Exception:
            pass
响应[0]来自返回的url get请求列表,但在本例中是索引0,在一个特定url上测试:
http://feeds.feedburner.com/marginalrevolution/feed

我们能够将XML代码插入W3学校验证程序,并获得:

This page contains the following errors:
error on line 163 at column 31: Input is not in proper UTF-8, indicate encoding! Bytes: 0x0C 0x66 0x69 0x67

但是使用
ET.XMLParser(encoding='utf-8')
属性,这不应该修复解析时的错误吗?

错误消息W3验证程序具有误导性。
0x0c
的问题不在于它是无效的UTF-8,而在于它不是XML中的一种格式

0x0c
是表单馈送控制字符,因此它在文档中的存在没有用处。一致性XML解析器必须拒绝格式不正确的文档,并且您不能更改rss提要,因此最简单的解决方案是在处理之前将其从文档中删除

>>> tree = ET.fromstring(original_response, ET.XMLParser(encoding='utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/xml/etree/ElementTree.py", line 1315, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 185, column 1106

>>> fixed = original_response.replace(b'\x0c', b'')
>>> tree = ET.fromstring(fixed, ET.XMLParser(encoding='utf-8'))
>>> tree
<Element 'rss' at 0x7ff316db6278>
>tree=ET.fromstring(原始响应,ET.XMLParser(encoding='utf-8'))
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“/usr/local/lib/python3.7/xml/etree/ElementTree.py”,第1315行,xml格式
parser.feed(文本)
xml.etree.ElementTree.ParseError:格式不正确(无效令牌):第185行第1106列
>>>修复=原始响应。替换(b'\x0c',b'')
>>>tree=ET.fromstring(已修复,ET.XMLParser(encoding='utf-8'))
>>>树

工作起来很有魅力!谢谢你的解释!