使用python和lxml访问重复的特定xml元素_Python_Lxml_Elementtree

使用python和lxml访问重复的特定xml元素

python

使用python和lxml访问重复的特定xml元素,python,lxml,elementtree,Python,Lxml,Elementtree,我有一个如下所示的xml文件 <_gmd_citation> <_gmd_CI_Citation> <_gmd_title xmlns:gml="http://www.opengis.net/gml" xmlns:msxsl="urn:schemas-microsoft-com:xslt"> <_gco_CharacterString>Conservation Areas</_gco_Charact

我有一个如下所示的xml文件

  <_gmd_citation>
    <_gmd_CI_Citation>
      <_gmd_title xmlns:gml="http://www.opengis.net/gml" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
          <_gco_CharacterString>Conservation Areas</_gco_CharacterString>
        </_gmd_title>
      <_gmd_alternateTitle _gco_nilReason="missing" />
      <_gmd_date>
        <_gmd_CI_Date>
          <_gmd_date xmlns:gml="http://www.opengis.net/gml" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
              <_gco_Date>2018-07-24</_gco_Date>
            </_gmd_date>
          <_gmd_dateType>
            <_gmd_CI_DateTypeCode codeListValue="publication" codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" />
          </_gmd_dateType>
        </_gmd_CI_Date>
      </_gmd_date>
      <_gmd_date>
        <_gmd_CI_Date>
          <_gmd_date xmlns:gml="http://www.opengis.net/gml" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
              <_gco_Date>2013-11-15</_gco_Date>
            </_gmd_date>
          <_gmd_dateType>
            <_gmd_CI_DateTypeCode codeListValue="creation" codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" />
          </_gmd_dateType>
        </_gmd_CI_Date>
      </_gmd_date>
      <_gmd_date>
        <_gmd_CI_Date>
          <_gmd_date xmlns:gml="http://www.opengis.net/gml" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
              **<_gco_Date>2016-11-11</_gco_Date>**
            </_gmd_date>
          <_gmd_dateType>
            <_gmd_CI_DateTypeCode codeListValue="**revision**" codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" />
          </_gmd_dateType>
        </_gmd_CI_Date>
      </_gmd_date>
      <_gmd_identifier>
        <_gmd_RS_Identifier>
          <_gmd_authority _gco_nilReason="missing" />
          <_gmd_code>
            <_gco_CharacterString>0000</_gco_CharacterString>
          </_gmd_code>
          <_gmd_codeSpace xmlns:gml="http://www.opengis.net/gml" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
              <_gco_CharacterString>abc</_gco_CharacterString>
            </_gmd_codeSpace>
        </_gmd_RS_Identifier>
      </_gmd_identifier>
    </_gmd_CI_Citation>

但我似乎无法指定要更改的标记。最好的方法是什么？

您遇到的“问题”不是有多个

\gco\u Date

元素，而是您的任务不是找到一个元素并对同一个元素做些什么。虽然您没有明确说明这一点（也不清楚文档可能有哪些结构变化），但我可以将您的目标表述如下：

查找并修改标记为

\u gco\u Date

且具有父元素的元素（标记为“gmd_date？”）这反过来又有一个兄弟姐妹和一个孩子有了孩子

\u gmd\u CI\u DateTypeCode

标记当且仅当后一个标记具有属性“codeListValue”等于“revision”

如果这（或类似的东西）是您所需要的，那么您必须使用文档结构，而不是简单地迭代元素，而不考虑元素的位置。元素树对象为您提供了实现这一点所需的一切（您可以获得父级、子级列表、同级列表等）

这是一个可以用作基础的原始示例（不是世界上最好的编码，只是一个原型！）：

您遇到的“问题”不是有多个

\u gco\u Date

元素，而是您的任务不是找到一个元素并对同一个元素执行某些操作。虽然您没有明确说明这一点（也不清楚文档可能有哪些结构变化），但我可以将您的目标表述如下：

查找并修改标记为

\u gco\u Date

且具有父元素的元素（标记为“gmd_date？”）这反过来又有一个兄弟姐妹和一个孩子有了孩子

\u gmd\u CI\u DateTypeCode

标记当且仅当后一个标记具有属性“codeListValue”等于“revision”

这是一个可以用作基础的原始示例（不是世界上最好的编码，只是一个原型！）：

与其迭代元素并测试标记名和属性值，不如尝试使用

通过使用（

[]

），我们可以轻松地选择我们需要的内容，而无需迭代

例如

**根据评论中的讨论更新了名称空间。**

XML输入（Input.XML）


保护区
2018-07-24
2013-11-15
2016-11-11
0000
abc

Python

从lxml导入etree
tree=etree.parse（“input.xml”）
ns={“gmd”：http://www.isotc211.org/2005/gmd“，“gco”：”http://www.isotc211.org/2005/gco"}
尝试：
elem=tree.xpath（//gmd:CI\u Date[gmd:dateType/gmd:CI\u DateTypeCode/）
“@codeListValue='revision']/gmd:date/gco:date”，名称空间=ns[0]
elem.text=“新值”
除索引器外：
通过
打印etree.tostring（树，pretty\u print=True）

输出


保护区
2018-07-24
2013-11-15
新价值
0000
abc

重要提示：确保两个命名空间URI（

http://www.isotc211.org/2005/gmd

和

http://www.isotc211.org/2005/gco

）精确匹配xml中的内容。注释中的URI已自动格式化，因此未显示“http://”部分

另外，有关在lxml中与名称空间一起使用XPath的详细信息。

请尝试使用

通过使用（

[]

），我们可以轻松地选择我们需要的内容，而无需迭代

例如

**根据评论中的讨论更新了名称空间。**

XML输入（Input.XML）


保护区
2018-07-24
2013-11-15
2016-11-11
for elem in treed.getiterator():
    print elem.tag
        if elem.tag == '_gmd_CI_DateTypeCode':
            if elem.attrib['codeListValue'] == 'revision':
                aa = elem.attrib['codeListValue']
                    print aa

import lxml.etree
p=lxml.etree.ETCompatXMLParser()
p.feed(open("test.xml").read())
d=p.close()

def dt_rev(e):
   """this finds if 'e' has a child node with the right tag and attribute value codeListValue == revision """
   for c in e.iterchildren():
     if c.tag == "_gmd_CI_DateTypeCode" and c.attrib['codeListValue'] == 'revision':
       return True
   return False

for e in d.getiterator():
    if e.tag == "_gco_Date":
        p = e.getparent()
        for s in p.itersiblings():
            if dt_rev(s):
                print ("found it!", e.text)
                # add code here to modify the element "e" as needed