Python 什么是ElementTree对象，如何从中获取数据？_Python_Xml_Lxml_Elementtree

Python 什么是ElementTree对象，如何从中获取数据？

python xml

Python 什么是ElementTree对象，如何从中获取数据？,python,xml,lxml,elementtree,Python,Xml,Lxml,Elementtree,我试图自学如何解析XML。我读过lxml教程，但它们很难理解。到目前为止，我可以做到： >>> from lxml import etree >>> xml=etree.parse('ham.xml') >>> xml <lxml.etree._ElementTree object at 0x118de60> 来自lxml导入etree的>> >>>xml=etree.parse（'ham.xml'） >>>xml 但是如何从

我试图自学如何解析XML。我读过lxml教程，但它们很难理解。到目前为止，我可以做到：

>>> from lxml import etree
>>> xml=etree.parse('ham.xml')
>>> xml
<lxml.etree._ElementTree object at 0x118de60>

来自lxml导入etree的

>>
>>>xml=etree.parse（'ham.xml'）
>>>xml

但是如何从这个对象中获取数据呢？它不能像

xml[0]

那样被索引，也不能被迭代

更具体地说，我正在使用并试图提取，比如说，

标记之间的所有内容，这些标记由

标记包围，这些标记包含

Barnardo

属性

这是一个问题

您还可以查看，它有一个。该页面将告诉您该类中您可能想要了解的每个属性和方法

不过，我还是从阅读这本书开始

但是，如果元素无法索引，则它是一个空标记，并且没有要检索的子节点

要通过

Bernardo

查找所有行，需要一个带有名称空间映射的XPath表达式。使用什么前缀无关紧要，只要它是非空字符串

lxml

就会将其映射到正确的命名空间URL：

nsmap = {'s': 'http://www.tei-c.org/ns/1.0'}

for line in tree.xpath('.//s:sp[@who="Barnardo"]/s:l/text()', namespaces=nsmap):
    print line.strip()

这将提取

标记中包含的

元素中的所有文本。注意标记名上的前缀

s:

，字典告诉

lxml

使用什么名称空间。我打印这些内容时，周围没有多余的空格

对于您的示例文档，它提供：

>>> for line in tree.xpath('.//s:sp[@who="Barnardo"]/s:l/text()', namespaces=nsmap):
...     print line.strip()
... 
Who's there?
Long live the king!
He.
'Tis now struck twelve; get thee to bed, Francisco.
Have you had quiet guard?
Well, good night.
If you do meet Horatio and Marcellus,
The rivals of my watch, bid them make haste.
Say,
What, is Horatio there?
Welcome, Horatio: welcome, good Marcellus.
I have seen nothing.
Sit down awhile;
And let us once again assail your ears,
That are so fortified against our story
What we have two nights seen.
Last night of all,
When yond same star that's westward from the pole
Had made his course to illume that part of heaven
Where now it burns, Marcellus and myself,
The bell then beating one,

In the same figure, like the king that's dead.
Looks 'a not like the king? mark it, Horatio.
It would be spoke to.
See, it stalks away!
How now, Horatio! you tremble and look pale:
Is not this something more than fantasy?
What think you on't?
I think it be no other but e'en so:
Well may it sort that this portentous figure
Comes armed through our watch; so like the king
That was and is the question of these wars.
'Tis here!
It was about to speak, when the cock crew.

这是一个好主意

您还可以查看，它有一个。该页面将告诉您该类中您可能想要了解的每个属性和方法

不过，我还是从阅读这本书开始

但是，如果元素无法索引，则它是一个空标记，并且没有要检索的子节点

要通过

Bernardo

查找所有行，需要一个带有名称空间映射的XPath表达式。使用什么前缀无关紧要，只要它是非空字符串

lxml

就会将其映射到正确的命名空间URL：

nsmap = {'s': 'http://www.tei-c.org/ns/1.0'}

for line in tree.xpath('.//s:sp[@who="Barnardo"]/s:l/text()', namespaces=nsmap):
    print line.strip()

这将提取

标记中包含的

元素中的所有文本。注意标记名上的前缀

s:

，字典告诉

lxml

使用什么名称空间。我打印这些内容时，周围没有多余的空格

对于您的示例文档，它提供：

>>> for line in tree.xpath('.//s:sp[@who="Barnardo"]/s:l/text()', namespaces=nsmap):
...     print line.strip()
... 
Who's there?
Long live the king!
He.
'Tis now struck twelve; get thee to bed, Francisco.
Have you had quiet guard?
Well, good night.
If you do meet Horatio and Marcellus,
The rivals of my watch, bid them make haste.
Say,
What, is Horatio there?
Welcome, Horatio: welcome, good Marcellus.
I have seen nothing.
Sit down awhile;
And let us once again assail your ears,
That are so fortified against our story
What we have two nights seen.
Last night of all,
When yond same star that's westward from the pole
Had made his course to illume that part of heaven
Where now it burns, Marcellus and myself,
The bell then beating one,

In the same figure, like the king that's dead.
Looks 'a not like the king? mark it, Horatio.
It would be spoke to.
See, it stalks away!
How now, Horatio! you tremble and look pale:
Is not this something more than fantasy?
What think you on't?
I think it be no other but e'en so:
Well may it sort that this portentous figure
Comes armed through our watch; so like the king
That was and is the question of these wars.
'Tis here!
It was about to speak, when the cock crew.

解析XML的一种方法是使用。您可以为

ElementTree

调用

xpath（）

成员函数，在您的示例中是

xml

例如，打印所有

元素（播放的行）的XML

本文详细介绍了xpath功能

正如下面所指出的，除非指定了名称空间，否则这不起作用。不幸的是，

lxml

不支持空名称空间，但是您可以将根节点更改为使用名为

前缀的名称空间，这也是上面使用的名称
<TEI xmlns:prefix="http://www.tei-c.org/ns/1.0" xml:id="sha-ham">

解析XML的一种方法是使用。您可以为ElementTree
调用xpath（）
成员函数，在您的示例中是xml

例如，打印所有
元素（播放的行）的XML
本文详细介绍了xpath功能
正如下面所指出的，除非指定了名称空间，否则这不起作用。不幸的是，lxml
不支持空名称空间，但是您可以将根节点更改为使用名为前缀的名称空间，这也是上面使用的名称
<TEI xmlns:prefix="http://www.tei-c.org/ns/1.0" xml:id="sha-ham">


试试：etree.tostring（xml）
很好，这很有效，但是如何从特定的标记获取数据？@Jono:如果您显示ham.xml
的内容，或者如果它非常大，至少是一个示例，那么会更容易帮助您。@MartijnPieters:好的。我将问题编辑为包含ham.xml
；请参阅try:etree.tostring（xml）
很好，这很有效，但是如何从特定标记获取数据？@Jono:如果您显示ham.xml
的内容，或者如果它非常大，至少显示一个样本，那么会更容易帮助您。@MartijnPieters:好的。我将问题编辑为包含ham.xml
；请参见，除非它告诉我子树
为空。我认为这是一个名称空间问题，但我不知道在哪里找到我的名称空间，也不知道如何告诉lxml它是什么。@MartijnPieters，你是对的，我的错误。不幸的是，XML文件使用了空名称空间，lxml
不支持该名称空间。@kgraney:胡说八道lxml
不支持使用空前缀进行查询。搜索树时，可以选择任意前缀。查看我的答案，使用有效代码。除了它告诉我子树
为空之外。我认为这是一个名称空间问题，但我不知道在哪里找到我的名称空间，也不知道如何告诉lxml它是什么。@MartijnPieters，你是对的，我的错误。不幸的是，XML文件使用了空名称空间，lxml
不支持该名称空间。@kgraney:胡说八道lxml
不支持使用空前缀进行查询。搜索树时，可以选择任意前缀。看到我的答案了吗，有工作代码。太棒了，谢谢。所以我想我只需要定义一个任意的名称空间ID，然后引用它。这太棒了，谢谢。所以我想我只需要定义一个任意的名称空间ID，然后引用它。