使用python lxml.etree处理大型XML文件

使用python lxml.etree处理大型XML文件,python,lxml,Python,Lxml,我想用Python中的lxml.etree解析一个巨大的xml(>200MB)。我尝试使用etree.parse加载XML文件,但由于文件大小的原因,这不起作用: etree.parse('file.xml')Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/l

我想用Python中的
lxml.etree
解析一个巨大的xml(>200MB)。我尝试使用
etree.parse
加载XML文件,但由于文件大小的原因,这不起作用:

etree.parse('file.xml')Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958)
  File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797)
  File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080)
  File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175)
  File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173)
  File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)
  File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)
  File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521)
lxml.etree.XMLSyntaxError: Excessive depth in document: 256 use XML_PARSE_HUGE option, line 1276, column 7
etree.parse('file.xml')回溯(最近一次调用最后一次):
文件“”,第1行,在
lxml.etree.parse(src/lxml/lxml.etree.c:49958)中的文件“lxml.etree.pyx”,第2706行
文件“parser.pxi”,第1500行,在lxml.etree.\u parseDocument(src/lxml/lxml.etree.c:71797)中
文件“parser.pxi”,第1529行,在lxml.etree.\u parseDocumentFromURL(src/lxml/lxml.etree.c:72080)中
文件“parser.pxi”,第1429行,在lxml.etree.\u parseDocFromFile(src/lxml/lxml.etree.c:71175)中
文件“parser.pxi”,第975行,在lxml.etree.\u BaseParser.\u parseDocFromFile(src/lxml/lxml.etree.c:68173)中
lxml.etree.\u ParserContext.\u handleParseResultDoc(src/lxml/lxml.etree.c:64257)中第539行的文件“parser.pxi”
lxml.etree.中的文件“parser.pxi”,第625行。\u handleParseResult(src/lxml/lxml.etree.c:65178)
文件“parser.pxi”,第565行,在lxml.etree中。\u raiseParserError(src/lxml/lxml.etree.c:64521)
lxml.etree.XMLSyntaxError:文档深度过深:256使用XML_PARSE_巨大选项,第1276行,第7列
因为我想使用xpath表达式,所以必须首先解析文件。因此,如何解析XML文件?如何在连接到
lxml.etree
时使用
XML\u PARSE\u mage


谢谢

尝试创建自定义
XMLParser
实例:

from lxml.etree import XMLParser, parse
p = XMLParser(huge_tree=True)
tree = parse('file.xml', parser=p)
如果您遇到以下错误:“PythonXMLSyntaxError:internal error:Hugh input lookup”,此解决方案也可以工作!