如何在XML文件中查找特定标记，然后使用Python和minidom访问其父标记_Python_Xml_Minidom

如何在XML文件中查找特定标记，然后使用Python和minidom访问其父标记

python xml

如何在XML文件中查找特定标记，然后使用Python和minidom访问其父标记,python,xml,minidom,Python,Xml,Minidom,我试图编写一些代码，在XML文章文件中搜索标签中包含的特定DOI。当它找到正确的DOI时，我希望它访问与该DOI关联的文章的和文本我的XML文件采用以下格式： <root> <article> <number> 0 </number> <DOI> 10.1016/B978-0-12-381015-1.00004-6 </DOI> <title> The patagon

我试图编写一些代码，在XML文章文件中搜索标签中包含的特定DOI。当它找到正确的DOI时，我希望它访问与该DOI关联的文章的

和

文本

我的XML文件采用以下格式：

<root>
 <article>
  <number>
   0 
  </number>
  <DOI>
   10.1016/B978-0-12-381015-1.00004-6 
  </DOI>
  <title>
   The patagonian toothfish biology, ecology and fishery. 
  </title>
  <abstract>
   lots of abstract text
  </abstract>
 </article>
 <article>
  ...All the article tags as shown above...
 </article>
</root>

但我不完全确定我在做什么

谢谢您的帮助。

minidom是一项要求吗？用lxml和XPath解析它将非常容易

from lxml import etree
datasource = open('/Users/philgw/Dropbox/PW-Honours-Project/Code/processed.xml').read()
tree = etree.fromstring(datasource)
path = tree.xpath("//article[DOI="10.1016/B978-0-12-381015-1.00004-6")

这将为您提供指定DOI的文章

而且，标签之间似乎有空白。我不知道这是否是因为Stackoverflow格式。这可能就是您无法将其与minidom匹配的原因。

imho-只需在python文档中查找即可！尝试以下方法（未测试）：

好的，我现在看到了，你需要首先在匹配节点中查找父节点，我更新它。。。好的，它现在更新了。谢谢Jiri-这看起来很有希望，但目前当我尝试测试它时，它不会返回任何输出。我已经在您的示例中添加了数据源行，但是没有打印任何内容。请注意，有一个问题是空格混淆了脚本，这非常有效-谢谢。

from lxml import etree
datasource = open('/Users/philgw/Dropbox/PW-Honours-Project/Code/processed.xml').read()
tree = etree.fromstring(datasource)
path = tree.xpath("//article[DOI="10.1016/B978-0-12-381015-1.00004-6")

from xml.dom import minidom

xmldoc = minidom.parse(datasource)   

def get_xmltext(parent, subnode_name):
    node = parent.getElementsByTagName(subnode_name)[0]
    return "".join([ch.toxml() for ch in node.childNodes])

matchingNodes = [node for node in xmldoc.getElementsByTagName("article")
           if get_xmltext(node, "DOI") == '10.1016/B978-0-12-381015-1.00004-6']

for node in matchingNodes:
    print "title:", get_xmltext(node, "title")
    print "abstract:", get_xmltext(node, "abstract")