Python 用XPath提取序列子集

Python 用XPath提取序列子集,python,xml,xpath,Python,Xml,Xpath,我正在寻找一个XPATH来将“集合”提取为单独的序列。它必须由pythonlxml(它是libxml2的包装器)来解释 例如,考虑到以下情况: <root> <sub1> <sub2> <Container> <item>1 - My laptop has exploded again</item> <ite

我正在寻找一个XPATH来将“集合”提取为单独的序列。它必须由python
lxml
(它是
libxml2
的包装器)来解释

例如,考虑到以下情况:

<root>
    <sub1>
        <sub2>
            <Container>
                <item>1 - My laptop has exploded again</item>
                <item>2 - This is an issue which needs to be fixed.</item>
            </Container>
        </sub2>
        <sub2>
            <Container>
                <item>3 - It's still not working</item>
                <item>4 - do we have a working IT department or what?</item>
            </Container>
        </sub2>
        <sub2>
            <Container>
                <item>5 - Never mind - I got my 8 year old niece to fix it</item>
            </Container>
        </sub2>
    </sub1>
</root>
第二顺序:

3 - It's still not working
4 - do we have a working IT department or what?
5 - Never mind - I got my 8 year old niece to fix it
第三顺序:

3 - It's still not working
4 - do we have a working IT department or what?
5 - Never mind - I got my 8 year old niece to fix it
其中“sequence”将被翻译成伪代码/python:

seq1 = ['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.']
seq2 = ['3 - It's still not working', '4 - do we have a working IT department or what?']
seq 3 = ['5 - Never mind - I got my 8 year old niece to fix it']
从一些初步的研究看来,这似乎是,但我想知道,是否有一些黑魔法是可行的

这就是结果:

[['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.'], ["3 - It's still not working", '4 - do we have a working IT department or what?'], ['5 - Never mind - I got my 8 year old niece to fix it']]
注 我假设数据位于名为“data.xml”的文件中,与包含上述代码的脚本位于同一目录中

  • 计算此XPath表达式:

    计数(/*/*/*)

  • 这将查找
    元素的数量(等效的、可读性更强但更长的元素是:

    count(/*/sub1/sub2))
    
  • 对于1到
    计数(/*/*/*/*)中的每个
    $n
    计算以下XPath表达式:

    /*/*/*[$n]/*/item/text()

  • 同样,这相当于更长、更具可读性:

    /*/sub1/sub2[$n]/Container/item/text()
    
    在计算上述表达式之前,将
    $n
    替换为
    $n
    的实际值(例如,对字符串使用
    格式()
    方法)

    对于提供的XML文档
    $n
    为3,因此实际计算的XPath表达式为:

    /*/*/*[1]/*/item/text()
    
    ,

    ,

    它们各自产生以下结果:

    集合(依赖于语言的--数组、序列、集合、
    IEnumerable
    ,…等):

    ,

    ,


    你所说的顺序到底是什么意思?@WilliamKinaan我添加了一个澄清,对,我添加了一个答案是的,我想我必须求助于类似的东西。我希望在一个纯XPATH表达式中。@lorenzog XPATH会给你一个文本列表,但你想要列表谢谢,这看起来很有趣。因此,重述一下——给定一个无限的nu在你所谓的“n”中,我必须将“count”的输出与数组选择器结合起来,这样我才能选择正确的列表。@lorenzog,我过去一直在使用Python,我不记得细节。我可以给你C代码或伪代码:` for(var I=1到n){var expression=string.Format(“/*/*[{0}]/*/item/text()”,I);var序列=求值(表达式);}我的意思是-你能用xpath迭代表达式吗?因为rest语言不是问题-事实上我用xpath试过了,它可以工作,所以现在我必须用python迭代两次;一次“计数”xpath,然后对每个元素进行迭代。@lorenzog,你不能用一个xpath 1.0或xpath 2.0/3.0表达式来实现这一点。在xpath中3.1有一个标准的数据类型数组,其项可以是序列。但我怀疑lxml是否支持XPath 2.0。因此,您目前的解决方案是您能够实现的最佳解决方案,而不需要XPath 3.1支持。您可以“迭代表达式”使用XPath 2.0/3.0,但结果将是一个单独的序列,因为序列不能嵌套。yep看起来是这样的——它说“序列永远不会嵌套,例如,将值1、(2,3)和()组合到一个单独的序列中会导致序列(1,2,3)。”就是这样
    /*/*/*[2]/*/item/text()
    
    /*/*/*[3]/*/item/text()
    
    "1 - My laptop has exploded again", "2 - This is an issue which needs to be fixed."
    
    "3 - It's still not working", "4 - do we have a working IT department or what?"
    
    "5 - Never mind - I got my 8 year old niece to fix it"