Python 使用XPATH读取棘手的XML

Python 使用XPATH读取棘手的XML,python,lxml,Python,Lxml,我是Python和XPATH的初学者,需要使用XPATH读取具有非统一节点(类似于下面提到的节点)的XML。要写入文件的输出格式如下所示。代码使用lxml库 请帮助我构建正确的XPATH 源XML <Classes> <German> <Student> <Span><a href="">John</a></Span> </Student>

我是Python和XPATH的初学者,需要使用XPATH读取具有非统一节点(类似于下面提到的节点)的XML。要写入文件的输出格式如下所示。代码使用lxml库

请帮助我构建正确的XPATH

源XML

<Classes>
    <German>
        <Student>
            <Span><a href="">John</a></Span>
        </Student>
        <Student>
            <Span>Adam</Span>
        </Student>
    </German>
    <English>
        <Student>
            <Span>Mary</Span>
        </Student>
    </English>
    <French>
        <Student>
            <Span><a href="">Anil</a></Span>
        </Student>
        <Student>
            <Span><a href="">Jack</a></Span>
        </Student>
    </French>
    <Spanish>
        <Student>
            <Span>Mary</Span>
        </Student>
        <Student>
            <Span>Jack</Span>
        </Student>
    </Spanish>
</Classes>
谢谢,
Nikhil

此代码将有所帮助:

from lxml import html

xml_content = """<Classes>
    <German>
        <Student>
            <Span><a href="">John</a></Span>
        </Student>
        <Student>
            <Span>Adam</Span>
        </Student>
    </German>
    <English>
        <Student>
            <Span>Mary</Span>
        </Student>
    </English>
    <French>
        <Student>
            <Span><a href="">Anil</a></Span>
        </Student>
        <Student>
            <Span><a href="">Jack</a></Span>
        </Student>
    </French>
    <Spanish>
        <Student>
            <Span>Mary</Span>
        </Student>
        <Student>
            <Span>Jack</Span>
        </Student>
    </Spanish>
</Classes>"""

tree = html.fromstring(xml_content)
classes = tree.xpath('//classes/*')
for language_class in classes:
    print language_class.tag.capitalize()
    for student in language_class.xpath('.//student/span//text()'):
        print "    {}".format(student)
German
    John
    Adam
English
    Mary
French
    Anil
    Jack
Spanish
    Mary
    Jack

此代码将有帮助:

from lxml import html

xml_content = """<Classes>
    <German>
        <Student>
            <Span><a href="">John</a></Span>
        </Student>
        <Student>
            <Span>Adam</Span>
        </Student>
    </German>
    <English>
        <Student>
            <Span>Mary</Span>
        </Student>
    </English>
    <French>
        <Student>
            <Span><a href="">Anil</a></Span>
        </Student>
        <Student>
            <Span><a href="">Jack</a></Span>
        </Student>
    </French>
    <Spanish>
        <Student>
            <Span>Mary</Span>
        </Student>
        <Student>
            <Span>Jack</Span>
        </Student>
    </Spanish>
</Classes>"""

tree = html.fromstring(xml_content)
classes = tree.xpath('//classes/*')
for language_class in classes:
    print language_class.tag.capitalize()
    for student in language_class.xpath('.//student/span//text()'):
        print "    {}".format(student)
German
    John
    Adam
English
    Mary
French
    Anil
    Jack
Spanish
    Mary
    Jack

到目前为止,你尝试了什么,问题到底是什么?你现在得到的输出与你预期的有什么不同?到目前为止你尝试了什么,问题到底是什么?你现在得到的输出与你预期的有什么不同?@Nikhil,我的荣幸。我很高兴我的回答有帮助。@Nikhil我很高兴。我很高兴我的回答有帮助。