Python：如何从xml中获取名称空间？_Python_Xml Parsing_Namespaces

Python：如何从xml中获取名称空间？

python

Python：如何从xml中获取名称空间？,python,xml-parsing,namespaces,Python,Xml Parsing,Namespaces,我们想从大量具有名称空间的xml中提取一些数据。问题是每个xml中的名称空间可能不同，并且每个xml中都有两个名称空间。Sample.xml如下所示： <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="http://some.domain/path1/styl.xsl"?> <a:ro

我们想从大量具有名称空间的xml中提取一些数据。问题是每个xml中的名称空间可能不同，并且每个xml中都有两个名称空间。Sample.xml如下所示：

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="http://some.domain/path1/styl.xsl"?>
<a:root xmlns:a="http://some.domain/path1/" xmlns:b="http://some.domain/path2/" xmlns:c="http://some.domain/path3/" xmlns:d="http://some.domain/path4/" xmlns:e="http://some.domain/path5/">
    <a:task>Get data from element your_data_sir in xml files</a:task>
    <a:files_to_process>More than 2k</a:files_to_process>
    <a:how>Using Python</a:how>
    <a:obstacle>
        <b:name>Namespaces</b:name>
        <c:description>Each xml file contain same xmlns:prefixes but the UIR of each prefix my differ!</c:description>
    </a:obstacle>
    <a:look_here>
        <d:your_data_sir>Glass of Whisky</d:your_data_sir>
        <d:your_data_sir>Cigar</d:your_data_sir>
        <d:your_data_sir>Python problem to solve</d:your_data_sir>
    </a:look_here>
    <e:other_things_to_know>
        <c:thing>Element look_here is allways a child of root element.</c:thing>
        <c:thing>look_here and your_data_sir preserve their prefixes in all xml files but URI can be different.</c:thing>
        <c:thing>Some xml files have different elements before and after look_here element.</c:thing>
        <c:thing>Number of siblings of look_here, before and after, may differ.</c:thing>
    </e:other_things_to_know>
</a:root>

from lxml import etree
dom = etree.parse('/path/to/Sample.xml').getroot()
ns = dom.nsmap
test = dom.find('a:look_here', ns)

for x in test:
    print(x.text)

我正在构建一个脚本，它将使用上面的脚本在文件夹和子文件夹中的每个xml文件上获取数据。问题在于，在某些xml文件中，xmlns:a（或其他前缀）中的URI可能不同。那样的话，我的脚本就找不到了。我不知道，也找不到从处理过的xml文件中获取所有性能并构建名称空间字典的方法。或者也许有一种不同的方法来解决我的问题

请帮忙。我是python新手，请您解释一下您的解决方案。

多亏了@mzjn，我成功地解决了我的问题。现在看来很简单。lxml有一个

nsmap

属性，用于构建名称空间字典。您必须首先安装lxml，因为默认情况下它不在python中。现在，我的脚本如下所示：

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="http://some.domain/path1/styl.xsl"?>
<a:root xmlns:a="http://some.domain/path1/" xmlns:b="http://some.domain/path2/" xmlns:c="http://some.domain/path3/" xmlns:d="http://some.domain/path4/" xmlns:e="http://some.domain/path5/">
    <a:task>Get data from element your_data_sir in xml files</a:task>
    <a:files_to_process>More than 2k</a:files_to_process>
    <a:how>Using Python</a:how>
    <a:obstacle>
        <b:name>Namespaces</b:name>
        <c:description>Each xml file contain same xmlns:prefixes but the UIR of each prefix my differ!</c:description>
    </a:obstacle>
    <a:look_here>
        <d:your_data_sir>Glass of Whisky</d:your_data_sir>
        <d:your_data_sir>Cigar</d:your_data_sir>
        <d:your_data_sir>Python problem to solve</d:your_data_sir>
    </a:look_here>
    <e:other_things_to_know>
        <c:thing>Element look_here is allways a child of root element.</c:thing>
        <c:thing>look_here and your_data_sir preserve their prefixes in all xml files but URI can be different.</c:thing>
        <c:thing>Some xml files have different elements before and after look_here element.</c:thing>
        <c:thing>Number of siblings of look_here, before and after, may differ.</c:thing>
    </e:other_things_to_know>
</a:root>

from lxml import etree
dom = etree.parse('/path/to/Sample.xml').getroot()
ns = dom.nsmap
test = dom.find('a:look_here', ns)

for x in test:
    print(x.text)

在我的例子中，

dom.nsmap

将返回

{'a'：'http://some.domain/path1/“，“b”：”http://some.domain/path2/“，‘c’：”http://some.domain/path3/“，‘d’：”http://some.domain/path4/“，“e”：”http://some.domain/path5/“}

这正是我过去两天一直在争取的。现在，我可以为它提供数千个文件，从中获取数据，而不会有丢失数据的风险。

你能用lxml代替ElementTree吗？lxml为元素提供了一个方便的

nsmap

属性：感谢@mzjn的评论nsmap解决了我的问题。