Python 在解析XML文件时,是否有一种方法可以使用lxml.etree跳过第一个条目或在特定子级开始迭代?

Python 在解析XML文件时,是否有一种方法可以使用lxml.etree跳过第一个条目或在特定子级开始迭代?,python,xml,parsing,xpath,lxml,Python,Xml,Parsing,Xpath,Lxml,我目前正在使用Python的xlml.etree包中的.iter方法来解析XML文件。有没有一种方法可以跳过第一个条目,或者使用XPath之类的东西在特定子级开始迭代 我已经研究了itertext和iterparse方法,但根据它们的定义,我不确定它是否会比我已经做过的将iter缩小到特定标签的工作做得更多 import lxml.etree as et parsedXML = et.parse(file_path) for child in parsedXML.iter('{http://

我目前正在使用Python的xlml.etree包中的.iter方法来解析XML文件。有没有一种方法可以跳过第一个条目,或者使用XPath之类的东西在特定子级开始迭代

我已经研究了itertext和iterparse方法,但根据它们的定义,我不确定它是否会比我已经做过的将iter缩小到特定标签的工作做得更多

import lxml.etree as et

parsedXML = et.parse(file_path)

for child in parsedXML.iter('{http://www.witsml.org/schemas/131}data'):
代码成功地解析了XML文件,但我想通过跳过空行或缺少足够字符数的行(都是逗号分隔的)来减少时间

<logData>
<data>63653079886,,,,,,,,,,,,,,,,,,,,,,,</data>
<data>63653079887,,,,,,,,,,,,,,,,,,,,,,,</data>
<data>63653079888,,,,,,,,,,,,,,,,,,,,,,,</data>
<data>63653079889,,,,,,,,,,,,,,,,,,,,,,,</data>
<data>63653079890,,29.3,155.8,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9</data>
<data>63653079891,,29.3,155.7,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9</data>
<data>63653079892,,29.3,155.8,12.25,0.0,0,0,93.76,-87.65,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9</data>

63653079886,,,,,,,,,,,,,,,,,,,,,,,
63653079887,,,,,,,,,,,,,,,,,,,,,,,
63653079888,,,,,,,,,,,,,,,,,,,,,,,
63653079889,,,,,,,,,,,,,,,,,,,,,,,
63653079890,,29.3,155.8,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
63653079891,,29.3,155.7,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
63653079892,,29.3,155.8,12.25,0.0,0,0,93.76,-87.65,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9

除了每行上的11位数值外,还有一行和几行是空的。我想跳过这一步,从本例中第一个具有12.25值的行(本例中为第5行)开始iter。

由于
数据
元素只有11位值和逗号(无任何空格)是34个字符,您可以在一个:

data[字符串长度(翻译(,'',)>34]
在检查字符串长度之前,我通常会删除所有空格

例如

XML输入(Input.XML)


63653079886,,,,,,,,,,,,,,,,,,,,,,,
63653079887,,,,,,,,,,,,,,,,,,,,,,,
63653079888,,,,,,,,,,,,,,,,,,,,,,,
63653079889,,,,,,,,,,,,,,,,,,,,,,,
63653079889, , , , , , , , , , , ,
63653079890,,29.3,155.8,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
63653079891,,29.3,155.7,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
63653079892,,29.3,155.8,12.25,0.0,0,0,93.76,-87.65,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
Python(我过去常常使打印输出更好。严格来说,这不是必需的。)

从lxml导入etree
parser=etree.XMLParser(删除\u blank\u text=True)
tree=etree.parse(“input.xml”,parser=parser)
对于tree.xpath中的数据(“数据[字符串长度(翻译(,'',)>34]”):
打印(etree.tostring(data.decode())
输出(打印到控制台)

63653079890,29.3155.8,12.25,0.0,0,95.31,-86.11172965412028641319105,1.00,1.00,-511.4,1.95,74,0,0264.1,3.4,356.9
63653079891,,29.3,155.7,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
63653079892,,29.3,155.8,12.25,0.0,0,0,93.76,-87.65,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
如果您真的想测试
12.25
值,那么在XPath 1.0谓词中,当前面的值的字符串长度未知时,它会有点混乱。你可以在一个盒子里放一系列的。虽然不太好看

xpath("data[substring-before(substring-after(substring-after(substring-after(substring-after(translate(.,' ',''),','),','),','),','),',') = '12.25']")

由于
数据
元素只有11位的值,逗号(不带任何空格)为34个字符,因此可以在以下情况下测试:

data[字符串长度(翻译(,'',)>34]
在检查字符串长度之前,我通常会删除所有空格

例如

XML输入(Input.XML)


63653079886,,,,,,,,,,,,,,,,,,,,,,,
63653079887,,,,,,,,,,,,,,,,,,,,,,,
63653079888,,,,,,,,,,,,,,,,,,,,,,,
63653079889,,,,,,,,,,,,,,,,,,,,,,,
63653079889, , , , , , , , , , , ,
63653079890,,29.3,155.8,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
63653079891,,29.3,155.7,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
63653079892,,29.3,155.8,12.25,0.0,0,0,93.76,-87.65,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
Python(我过去常常使打印输出更好。严格来说,这不是必需的。)

从lxml导入etree
parser=etree.XMLParser(删除\u blank\u text=True)
tree=etree.parse(“input.xml”,parser=parser)
对于tree.xpath中的数据(“数据[字符串长度(翻译(,'',)>34]”):
打印(etree.tostring(data.decode())
输出(打印到控制台)

63653079890,29.3155.8,12.25,0.0,0,95.31,-86.11172965412028641319105,1.00,1.00,-511.4,1.95,74,0,0264.1,3.4,356.9
63653079891,,29.3,155.7,12.25,0.0,0,0,95.31,-86.11,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
63653079892,,29.3,155.8,12.25,0.0,0,0,93.76,-87.65,1729654,1202864,1319105,1.00,1.00,-511.4,1.95,74,0,0,264.1,3.4,,356.9
如果您真的想测试
12.25
值,那么在XPath 1.0谓词中,当前面的值的字符串长度未知时,它会有点混乱。你可以在一个盒子里放一系列的。虽然不太好看

xpath("data[substring-before(substring-after(substring-after(substring-after(substring-after(translate(.,' ',''),','),','),','),','),',') = '12.25']")

是的,你应该能。您能添加一个示例XML并记下您想要选择的元素吗?我的回答有帮助吗?或者您仍然有问题吗?是的,您应该能够。您能添加一个示例XML并记下您想要选择的元素吗?我的回答有帮助吗?还是您仍然有问题?