Python 用XPath提取序列子集
我正在寻找一个XPATH来将“集合”提取为单独的序列。它必须由pythonPython 用XPath提取序列子集,python,xml,xpath,Python,Xml,Xpath,我正在寻找一个XPATH来将“集合”提取为单独的序列。它必须由pythonlxml(它是libxml2的包装器)来解释 例如,考虑到以下情况: <root> <sub1> <sub2> <Container> <item>1 - My laptop has exploded again</item> <ite
lxml
(它是libxml2
的包装器)来解释
例如,考虑到以下情况:
<root>
<sub1>
<sub2>
<Container>
<item>1 - My laptop has exploded again</item>
<item>2 - This is an issue which needs to be fixed.</item>
</Container>
</sub2>
<sub2>
<Container>
<item>3 - It's still not working</item>
<item>4 - do we have a working IT department or what?</item>
</Container>
</sub2>
<sub2>
<Container>
<item>5 - Never mind - I got my 8 year old niece to fix it</item>
</Container>
</sub2>
</sub1>
</root>
第二顺序:
3 - It's still not working
4 - do we have a working IT department or what?
5 - Never mind - I got my 8 year old niece to fix it
第三顺序:
3 - It's still not working
4 - do we have a working IT department or what?
5 - Never mind - I got my 8 year old niece to fix it
其中“sequence”将被翻译成伪代码/python:
seq1 = ['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.']
seq2 = ['3 - It's still not working', '4 - do we have a working IT department or what?']
seq 3 = ['5 - Never mind - I got my 8 year old niece to fix it']
从一些初步的研究看来,这似乎是,但我想知道,是否有一些黑魔法是可行的
这就是结果:
[['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.'], ["3 - It's still not working", '4 - do we have a working IT department or what?'], ['5 - Never mind - I got my 8 year old niece to fix it']]
注
我假设数据位于名为“data.xml”的文件中,与包含上述代码的脚本位于同一目录中
计数(/*/*/*)
元素的数量(等效的、可读性更强但更长的元素是:
count(/*/sub1/sub2))
计数(/*/*/*/*)中的每个$n
,
计算以下XPath表达式:
/*/*/*[$n]/*/item/text()
/*/sub1/sub2[$n]/Container/item/text()
在计算上述表达式之前,将$n
替换为$n
的实际值(例如,对字符串使用格式()
方法)
对于提供的XML文档$n
为3,因此实际计算的XPath表达式为:
/*/*/*[1]/*/item/text()
,
,
它们各自产生以下结果:
集合(依赖于语言的--数组、序列、集合、IEnumerable
,…等):
,
,
你所说的顺序到底是什么意思?@WilliamKinaan我添加了一个澄清,对,我添加了一个答案是的,我想我必须求助于类似的东西。我希望在一个纯XPATH表达式中。@lorenzog XPATH会给你一个文本列表,但你想要列表谢谢,这看起来很有趣。因此,重述一下——给定一个无限的nu在你所谓的“n”中,我必须将“count”的输出与数组选择器结合起来,这样我才能选择正确的列表。@lorenzog,我过去一直在使用Python,我不记得细节。我可以给你C代码或伪代码:` for(var I=1到n){var expression=string.Format(“/*/*[{0}]/*/item/text()”,I);var序列=求值(表达式);}我的意思是-你能用xpath迭代表达式吗?因为rest语言不是问题-事实上我用xpath试过了,它可以工作,所以现在我必须用python迭代两次;一次“计数”xpath,然后对每个元素进行迭代。@lorenzog,你不能用一个xpath 1.0或xpath 2.0/3.0表达式来实现这一点。在xpath中3.1有一个标准的数据类型数组,其项可以是序列。但我怀疑lxml是否支持XPath 2.0。因此,您目前的解决方案是您能够实现的最佳解决方案,而不需要XPath 3.1支持。您可以“迭代表达式”使用XPath 2.0/3.0,但结果将是一个单独的序列,因为序列不能嵌套。yep看起来是这样的——它说“序列永远不会嵌套,例如,将值1、(2,3)和()组合到一个单独的序列中会导致序列(1,2,3)。”就是这样
/*/*/*[2]/*/item/text()
/*/*/*[3]/*/item/text()
"1 - My laptop has exploded again", "2 - This is an issue which needs to be fixed."
"3 - It's still not working", "4 - do we have a working IT department or what?"
"5 - Never mind - I got my 8 year old niece to fix it"