Python 用XPath提取序列子集_Python_Xml_Xpath

Python 用XPath提取序列子集

python xml xpath

Python 用XPath提取序列子集,python,xml,xpath,Python,Xml,Xpath,我正在寻找一个XPATH来将“集合”提取为单独的序列。它必须由pythonlxml（它是libxml2的包装器）来解释例如，考虑到以下情况： <root> <sub1> <sub2> <Container> <item>1 - My laptop has exploded again</item> <ite

我正在寻找一个XPATH来将“集合”提取为单独的序列。它必须由python

lxml

（它是

libxml2

的包装器）来解释

例如，考虑到以下情况：

<root>
    <sub1>
        <sub2>
            <Container>
                <item>1 - My laptop has exploded again</item>
                <item>2 - This is an issue which needs to be fixed.</item>
            </Container>
        </sub2>
        <sub2>
            <Container>
                <item>3 - It's still not working</item>
                <item>4 - do we have a working IT department or what?</item>
            </Container>
        </sub2>
        <sub2>
            <Container>
                <item>5 - Never mind - I got my 8 year old niece to fix it</item>
            </Container>
        </sub2>
    </sub1>
</root>

第二顺序：

3 - It's still not working
4 - do we have a working IT department or what?

5 - Never mind - I got my 8 year old niece to fix it

第三顺序：

3 - It's still not working
4 - do we have a working IT department or what?

5 - Never mind - I got my 8 year old niece to fix it

其中“sequence”将被翻译成伪代码/python：

seq1 = ['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.']
seq2 = ['3 - It's still not working', '4 - do we have a working IT department or what?']
seq 3 = ['5 - Never mind - I got my 8 year old niece to fix it']

从一些初步的研究看来，这似乎是，但我想知道，是否有一些黑魔法是可行的

这就是结果：

[['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.'], ["3 - It's still not working", '4 - do we have a working IT department or what?'], ['5 - Never mind - I got my 8 year old niece to fix it']]

注我假设数据位于名为“data.xml”的文件中，与包含上述代码的脚本位于同一目录中

计算此XPath表达式：

计数（/*/*/*）

这将查找

元素的数量（等效的、可读性更强但更长的元素是：

count(/*/sub1/sub2))

对于1到

计数（/*/*/*/*）中的每个$n
，

计算以下XPath表达式：

/*/*/*[$n]/*/item/text（）

同样，这相当于更长、更具可读性：

/*/sub1/sub2[$n]/Container/item/text()

在计算上述表达式之前，将

$n

替换为

$n

的实际值（例如，对字符串使用

格式（）

方法）

对于提供的XML文档

$n

为3，因此实际计算的XPath表达式为：

/*/*/*[1]/*/item/text()

它们各自产生以下结果：

集合（依赖于语言的--数组、序列、集合、

IEnumerable

，…等）：

你所说的顺序到底是什么意思？@WilliamKinaan我添加了一个澄清，对，我添加了一个答案是的，我想我必须求助于类似的东西。我希望在一个纯XPATH表达式中。@lorenzog XPATH会给你一个文本列表，但你想要列表谢谢，这看起来很有趣。因此，重述一下——给定一个无限的nu在你所谓的“n”中，我必须将“count”的输出与数组选择器结合起来，这样我才能选择正确的列表。@lorenzog，我过去一直在使用Python，我不记得细节。我可以给你C代码或伪代码：` for（var I=1到n）{var expression=string.Format（“/*/*[{0}]/*/item/text（）”，I）；var序列=求值（表达式）；}我的意思是-你能用xpath迭代表达式吗？因为rest语言不是问题-事实上我用xpath试过了，它可以工作，所以现在我必须用python迭代两次；一次“计数”xpath，然后对每个元素进行迭代。@lorenzog，你不能用一个xpath 1.0或xpath 2.0/3.0表达式来实现这一点。在xpath中3.1有一个标准的数据类型数组，其项可以是序列。但我怀疑lxml是否支持XPath 2.0。因此，您目前的解决方案是您能够实现的最佳解决方案，而不需要XPath 3.1支持。您可以“迭代表达式”使用XPath 2.0/3.0，但结果将是一个单独的序列，因为序列不能嵌套。yep看起来是这样的——它说“序列永远不会嵌套，例如，将值1、（2，3）和（）组合到一个单独的序列中会导致序列（1，2，3）。”就是这样

/*/*/*[2]/*/item/text()

/*/*/*[3]/*/item/text()

"1 - My laptop has exploded again", "2 - This is an issue which needs to be fixed."

"3 - It's still not working", "4 - do we have a working IT department or what?"

"5 - Never mind - I got my 8 year old niece to fix it"