使用libxml2检索Python中元素的属性_Python_Libxml2

使用libxml2检索Python中元素的属性

python

使用libxml2检索Python中元素的属性,python,libxml2,Python,Libxml2,我正在编写第一个使用libxml2从XML文件检索数据的Python脚本。该文件如下所示： <myGroups1> <myGrpContents name="ABC" help="abc_help"> <myGrpKeyword name="abc1" help="help1"/> <myGrpKeyword name="abc2" help="help2"/> <myGrpKeyword name="abc3"

我正在编写第一个使用libxml2从XML文件检索数据的Python脚本。该文件如下所示：

<myGroups1>
<myGrpContents name="ABC" help="abc_help">
     <myGrpKeyword name="abc1" help="help1"/>
     <myGrpKeyword name="abc2" help="help2"/>
     <myGrpKeyword name="abc3" help="help3"/>
</myGrpContents>
</myGroups1>

如何更深入地迭代元素并获得属性？在此方面的任何帮助都将不胜感激。

没有使用libxml2，但深入到案例中发现了这一点

试试看

if child.type == "element":
    if child.name == "myGrpKeyword":
        print child.prop('name')
        print child.prop('help')

或

提及

更新：

尝试一个递归函数

def explore(child):     
    while child is not None:
        if not child.isBlankNode():
            if child.type == "element":
                print element.prop('name')
                print element.prop('help')
                explore(child.children)
        child = child.next
doc = libxml2.parseFile(cmmfilename)
root2 = doc.children
child = root2.children
explore(child)

可能就是你想要的答案

当遇到这样的问题时，当我出于某种原因不想阅读文档时，以这种交互方式探索库会很有帮助——我建议您使用交互式python repl（我喜欢bpython）来尝试这一点。以下是我的课程，我提出了一个解决方案：

>>> import libxml2
>>> xml = """<myGroups1>
... <myGrpContents name="ABC" help="abc_help">
...      <myGrpKeyword name="abc1" help="help1"/>
...      <myGrpKeyword name="abc2" help="help2"/>
...      <myGrpKeyword name="abc3" help="help3"/>
... </myGrpContents>
... </myGroups1>"""
>>> tree = libxml2.parseMemory(xml, len(xml)) # I found this method by looking through `dir(libxml2)`
>>> tree.children
<xmlNode (myGroups1) object at 0x10aba33b0>
>>> a = tree.children
>>> a
<xmlNode (myGroups1) object at 0x10a919ea8>
>>> a.children
<xmlNode (text) object at 0x10ab24368>
>>> a.properties
>>> b = a.children
>>> b.children
>>> b.properties
>>> b.next
<xmlNode (myGrpContents) object at 0x10a921290>
>>> b.next.content
'\n     \n     \n     \n'
>>> b.next.next.content
'\n'
>>> b.next.next.next.content
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'content'
>>> b.next.next.next
>>> b.next.properties
<xmlAttr (name) object at 0x10aba32d8>
>>> b.next.properties.children
<xmlNode (text) object at 0x10ab40f38>
>>> b.next.properties.children.content
'ABC'
>>> b.next.properties.children.name
'text'
>>> b.next.properties.next
<xmlAttr (help) object at 0x10ab40fc8>
>>> b.next.properties.next.name
'help'
>>> b.next.properties.next.content
'abc_help'
>>> list(tree)
[<xmlDoc (None) object at 0x10a921248>, <xmlNode (myGroups1) object at 0x10aba32d8>, <xmlNode (text) object at 0x10aba3878>, <xmlNode (myGrpContents) object at 0x10aba3d88>, <xmlNode (text) object at 0x10aba3950>, <xmlNode (myGrpKeyword) object at 0x10aba3758>, <xmlNode (text) object at 0x10aba3320>, <xmlNode (myGrpKeyword) object at 0x10aba3f38>, <xmlNode (text) object at 0x10aba3560>, <xmlNode (myGrpKeyword) object at 0x10aba3998>, <xmlNode (text) object at 0x10aba33f8>, <xmlNode (text) object at 0x10aba38c0>]
>>> good = list(tree)[5]
>>> good.properties
<xmlAttr (name) object at 0x10aba35f0>
>>> good.prop('name')
'abc1'
>>> good.prop('help')
'help1'
>>> good.prop('whoops')
>>> good.hasProp('whoops')
>>> good.hasProp('name')
<xmlAttr (name) object at 0x10ab40ef0>
>>> good.hasProp('name').content
'abc1'
>>> for thing in tree:
...     if thing.hasProp('name') and thing.hasProp('help'):
...         print thing.prop('name'), thing.prop('help')
...         
...     
... 
ABC abc_help
abc1 help1
abc2 help2
abc3 help3

导入libxml2 >>>xml=”“” ... ... ... ... ... ... """ >>>tree=libxml2.parseMemory（xml，len（xml））#我通过`dir（libxml2）找到了这个方法` >>>树孩子 >>>a=树。孩子们 >>>a >>>a.儿童 >>>a.财产 >>>b=a.儿童 >>>b.儿童 >>>b.财产 >>>下一个 >>>下一步是内容 “\n\n\n\n” >>>下一个，下一个，内容 “\n” >>>下一个。下一个。下一个。内容回溯（最近一次呼叫最后一次）：文件“”，第1行，在 AttributeError:“非类型”对象没有属性“内容” >>>下一个，下一个 >>>b.下一步 >>>b.下一个 >>>b.next.properties.children.content “ABC” >>>b.next.properties.children.name “文本” >>>下一个 >>>b.next.properties.next.name “救命” >>>b.next.properties.next.content “abc_帮助” >>>列表（树） [, , , , , , ] >>>好=列表（树）[5] >>>好的 >>>好的。道具（'name'）） “abc1” >>>好的。道具（“帮助”） “帮助1” >>>好的，道具（‘哎呀’） >>>好的，哈斯普洛普（‘哎呀’） >>>好的，hasProp（'name'）） >>>好的。hasProp（'name'）。内容 “abc1” >>>树上的东西： ... 如果thing.hasProp（'name'）和thing.hasProp（'help'）： ... 打印thing.prop（'name'）、thing.prop（'help'）） ... ... ... 帮助 abc1帮助1 abc2帮助2 abc3帮助3

因为它是bpython，所以我有点作弊-有一个倒带键，所以我输入的错误比这个多，但除此之外，这非常接近。

看起来您必须以某种方式询问myGroups1的孩子们-您知道其他循环中可以有循环吗？如果你想举个例子的话，请告诉我。我正在寻找一个类似于循环的东西，它可以深入到没有其他元素为止。如果你能提供一些例子，那将非常有帮助。对于myGroups1的子级，我只能提取第一级子级，即上面示例中的mygrpcontent。另外，我不知道使用什么方法来提取它们的属性。为了完成任务，您可能需要阅读xpath。但是你熟悉递归吗？这是执行多层循环的常用方法。如果没有，你能想象一个while循环，它一直在请求一个节点的子节点，如果它没有子节点，它会继续到下一个同级节点，如果它没有子节点，直到父节点中的一个有同级节点，它才会请求父节点？问题是，当我试图从根元素的子节点提取属性时（通过在上述代码中迭代child.properties），它抛出错误“TypeError:iteration over non sequence”。我是否应该使用其他方法从XML节点的子节点提取属性？当前是“.properties”只对根节点有效。同时，您可以分享您在第一条评论中提到的示例吗？代码如下：doc=libxml2.parseFile（filename）root2=doc.children child=root2.children而children不是None:if not child.isBlankNode（）：if child.type==“element”：print“\t element”，child.name，“with”，child.lsCountNode（），“child（ren）”表示child.properties中的属性：if property.type==“attribute”：print property.name，=”，property.content这实际上抛出了我在上面的评论中提到的错误。谢谢你的回复。但是，这里我不想比较像“myGrpKeyword”这样的元素名要获取属性，我需要遍历整个文件中根元素的所有子元素。此外，当我尝试遍历child.properties时，它会显示错误“TypeError:iteration over non sequence”我用递归函数更新了答案，只要检查它是否适合您的问题。是的。非常感谢。我使用以下代码打印了所有属性：def printalAttributes（node）：print“node Name=”，node.Name if node.properties！=无：对于node.properties中的属性：if attribute.Name！=“text”：print attribute.name，“：”，attribute.content nodeList.append（node.name）；print“\n”非常感谢：）同时，在研究了xpath的用法之后，我得到了另一个解决方案：result=doc.xpathEval（'/*'），用于result中的节点：if node.type==“element”：if node.prop（“name”）！=无：如果node.prop（“帮助”），则打印node.prop（“名称”）！=无：打印node.prop（“帮助”）另一个更好，因为它可以打印所有属性，而无需显式编码：打印“node Name=”，node.Name if node.properties！=无：对于node.properties中的属性：打印“Attr:”，attribute.name，”，value:”，attribute.content

def explore(child):     
    while child is not None:
        if not child.isBlankNode():
            if child.type == "element":
                print element.prop('name')
                print element.prop('help')
                explore(child.children)
        child = child.next
doc = libxml2.parseFile(cmmfilename)
root2 = doc.children
child = root2.children
explore(child)

>>> import libxml2
>>> xml = """<myGroups1>
... <myGrpContents name="ABC" help="abc_help">
...      <myGrpKeyword name="abc1" help="help1"/>
...      <myGrpKeyword name="abc2" help="help2"/>
...      <myGrpKeyword name="abc3" help="help3"/>
... </myGrpContents>
... </myGroups1>"""
>>> tree = libxml2.parseMemory(xml, len(xml)) # I found this method by looking through `dir(libxml2)`
>>> tree.children
<xmlNode (myGroups1) object at 0x10aba33b0>
>>> a = tree.children
>>> a
<xmlNode (myGroups1) object at 0x10a919ea8>
>>> a.children
<xmlNode (text) object at 0x10ab24368>
>>> a.properties
>>> b = a.children
>>> b.children
>>> b.properties
>>> b.next
<xmlNode (myGrpContents) object at 0x10a921290>
>>> b.next.content
'\n     \n     \n     \n'
>>> b.next.next.content
'\n'
>>> b.next.next.next.content
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'content'
>>> b.next.next.next
>>> b.next.properties
<xmlAttr (name) object at 0x10aba32d8>
>>> b.next.properties.children
<xmlNode (text) object at 0x10ab40f38>
>>> b.next.properties.children.content
'ABC'
>>> b.next.properties.children.name
'text'
>>> b.next.properties.next
<xmlAttr (help) object at 0x10ab40fc8>
>>> b.next.properties.next.name
'help'
>>> b.next.properties.next.content
'abc_help'
>>> list(tree)
[<xmlDoc (None) object at 0x10a921248>, <xmlNode (myGroups1) object at 0x10aba32d8>, <xmlNode (text) object at 0x10aba3878>, <xmlNode (myGrpContents) object at 0x10aba3d88>, <xmlNode (text) object at 0x10aba3950>, <xmlNode (myGrpKeyword) object at 0x10aba3758>, <xmlNode (text) object at 0x10aba3320>, <xmlNode (myGrpKeyword) object at 0x10aba3f38>, <xmlNode (text) object at 0x10aba3560>, <xmlNode (myGrpKeyword) object at 0x10aba3998>, <xmlNode (text) object at 0x10aba33f8>, <xmlNode (text) object at 0x10aba38c0>]
>>> good = list(tree)[5]
>>> good.properties
<xmlAttr (name) object at 0x10aba35f0>
>>> good.prop('name')
'abc1'
>>> good.prop('help')
'help1'
>>> good.prop('whoops')
>>> good.hasProp('whoops')
>>> good.hasProp('name')
<xmlAttr (name) object at 0x10ab40ef0>
>>> good.hasProp('name').content
'abc1'
>>> for thing in tree:
...     if thing.hasProp('name') and thing.hasProp('help'):
...         print thing.prop('name'), thing.prop('help')
...         
...     
... 
ABC abc_help
abc1 help1
abc2 help2
abc3 help3