python小型DOM xml解析

python小型DOM xml解析,python,xml,minidom,Python,Xml,Minidom,我试图找到一种在使用minidom解析xml文件时获取索引号的方法。 xml看起来像这样 <stuff> <morestuff> <sometag>catagory1</sometag> <path pathversion="1">/path Im looking to for</path> #<--info i'm after <path pathvers

我试图找到一种在使用minidom解析xml文件时获取索引号的方法。 xml看起来像这样

<stuff>
    <morestuff>
        <sometag>catagory1</sometag>
        <path pathversion="1">/path Im looking to for</path> #<--info i'm after
        <path pathversion="2">/path I don't need</path>
        <path pathversion="3">/path I don't need</path>
    </morestuff>
    <morestuff>
        <sometag>catagory2</sometag>
        <path pathversion="1">/other path I'm looking for</path> #<--info i'm after
        <path pathversion="2">/path I don't need</path>
        <path pathversion="3">/path I don't need</path>
    </morestuff>
</stuff>
for element in node.getElementsByTagName('sometag'):
    if element.firstChild.data == 'catagory1':
        elementid = element.indexnumber #<----how do I write the [0], or [1] to a variable so I can use it to discribe the position in the next line
        var1 = node.getElementsByTagName('path')[elementid].firstChild.data
    if element.firstChild.data == 'catagory2':
        elementid = element.indexnumber
        var2 = node.getElementsByTagName('path')[elementid].firstChild.data
我想做这样的事情

<stuff>
    <morestuff>
        <sometag>catagory1</sometag>
        <path pathversion="1">/path Im looking to for</path> #<--info i'm after
        <path pathversion="2">/path I don't need</path>
        <path pathversion="3">/path I don't need</path>
    </morestuff>
    <morestuff>
        <sometag>catagory2</sometag>
        <path pathversion="1">/other path I'm looking for</path> #<--info i'm after
        <path pathversion="2">/path I don't need</path>
        <path pathversion="3">/path I don't need</path>
    </morestuff>
</stuff>
for element in node.getElementsByTagName('sometag'):
    if element.firstChild.data == 'catagory1':
        elementid = element.indexnumber #<----how do I write the [0], or [1] to a variable so I can use it to discribe the position in the next line
        var1 = node.getElementsByTagName('path')[elementid].firstChild.data
    if element.firstChild.data == 'catagory2':
        elementid = element.indexnumber
        var2 = node.getElementsByTagName('path')[elementid].firstChild.data

这将创建一个包含所需信息的词典:

import xml.dom.minidom
doc = xml.dom.minidom.parseString(test)

paths = {}

for element in doc.getElementsByTagName('morestuff'):
    # get the text value of the sometag tag
    category = element.getElementsByTagName('sometag')[0].firstChild.nodeValue

    # get all the paths which are children of the morestuff element
    for path in element.getElementsByTagName('path'):
        if path.getAttribute('pathversion') == '1':
            pathstr = path.firstChild.nodeValue
            paths[category] = pathstr

print paths
我得到的结果是:

{u'catagory1': u'/path Im looking to for', u'catagory2': u"/other path I'm looking for"}

按照Keith的建议使用etree如何:-

['/path Im looking to for', "/other path I'm looking for"]
使用此代码:-

import xml.etree.ElementTree as ET
tree = ET.fromstring('''<stuff>
    <morestuff>
        <sometag>catagory1</sometag>
        <path pathversion="1">/path Im looking to for</path>
        <path pathversion="2">/path I don't need</path>
        <path pathversion="3">/path I don't need</path>
    </morestuff>
    <morestuff>
        <sometag>catagory2</sometag>
        <path pathversion="1">/other path I'm looking for</path>
        <path pathversion="2">/path I don't need</path>
        <path pathversion="3">/path I don't need</path>
    </morestuff>
</stuff>
''')
print [e.text for e in tree.findall('.//morestuff/path[@pathversion="1"]')]

我建议使用elementtree或lxml代替minidom。看,不清楚你想得到什么索引。你能给出所需输出的样本吗?您想要元素在其同级中的索引吗?我正在尝试获取catigory 1和catigory 2下这两项的索引。让我感到棘手的是,我需要知道它们来自哪个类别,而这些类别可能不是按那个顺序排列的。@Keith感谢elementtree的建议,看起来我可以用它抓取索引,就像在这里找到的一样。我会看看我是否可以用elementtree重做这个。