使用python etree打印xml的嵌套元素_Python_Xml_Xml Parsing_Elementtree_Xml.etree

使用python etree打印xml的嵌套元素

python xml

使用python etree打印xml的嵌套元素,python,xml,xml-parsing,elementtree,xml.etree,Python,Xml,Xml Parsing,Elementtree,Xml.etree,我正在尝试构建一个脚本来读取xml文件。这是我第一次解析xml，我正在使用python和xml.etree.ElementTree进行解析。我要处理的文件部分如下所示： <component> <section> <id root="42CB916B-BB58-44A0-B8D2-89B4B27F04DF" /> <code code="34089-3" codeS

我正在尝试构建一个脚本来读取xml文件。这是我第一次解析xml，我正在使用python和xml.etree.ElementTree进行解析。我要处理的文件部分如下所示：

    <component>
        <section>
                <id root="42CB916B-BB58-44A0-B8D2-89B4B27F04DF" />
                <code code="34089-3" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="DESCRIPTION SECTION" />
                <title mediaType="text/x-hl7-title+xml">DESCRIPTION</title>
                <text>
                        <paragraph>Renese<sup>®</sup> is designated generically as polythiazide, and chemically as 2<content styleCode="italics">H</content>-1,2,4-Benzothiadiazine-7-sulfonamide, 6-chloro-3,4-dihydro-2-methyl-3-[[(2,2,2-trifluoroethyl)thio]methyl]-, 1,1-dioxide. It is a white crystalline substance, insoluble in water but readily soluble in alkaline solution.</paragraph>
                        <paragraph>Inert Ingredients: dibasic calcium phosphate; lactose; magnesium stearate; polyethylene glycol; sodium lauryl sulfate; starch; vanillin. The 2 mg tablets also contain: Yellow 6; Yellow 10.</paragraph>
                </text>
                <effectiveTime value="20051214" />
        </section>
</component>    
<component>
        <section>
               <id root="CF5D392D-F637-417C-810A-7F0B3773264F" />
               <code code="42229-5" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="SPL UNCLASSIFIED SECTION" />
               <title mediaType="text/x-hl7-title+xml">ACTION</title>
               <text>
                        <paragraph>The mechanism of action results in an interference with the renal tubular mechanism of electrolyte reabsorption. At maximal therapeutic dosage all thiazides are approximately equal in their diuretic potency. The mechanism whereby thiazides function in the control of hypertension is unknown.</paragraph>
                </text>
                <effectiveTime value="20051214" />
                </section>
</component>

到目前为止，我能够打印标题，但我也想打印标签中捕获的相应文本

我试过这个：

for title in tree.iter('title'):
     print(title.text)
     for paragraph in title.iter('paragraph'):
         print(paragraph.text)

但是我没有这段文字的输出

做

for title in tree.iter('title'):
         print(title.text)
         for paragraph in tree.iter('paragraph'):
             print(paragraph.text)

我打印段落的文本，但（显然）它是针对xml结构中的每个标题一起打印的

我想找到一种方法1）确定标题；2）打印相应段落。

我该怎么做呢？

如果您愿意使用lxml，那么下面是一个使用XPath的解决方案：

重新导入
从lxml.etree导入fromstring
以open（“ABD6ECF0-DC8E-41DE-89F2-1E36ED9D6535.xml”）作为f：
xmlstring=f.read（）
xmlstring=re.sub（r'\sxmlns=“[^”]+”，''，xmlstring，count=1）
doc=fromstring（xmlstring.encode（））#lxml只接受字节输入，因此我们编码
对于doc.xpath（“//title”）中的标题：#对于所有标题节点
title_text=title.xpath（'./text（）'）#获取节点的文本值
#获取显示较低（//段落）的段落节点的所有文本值
#在层次结构中，而不是
_title=title.xpath（“../paragration/text（）”）的段落
打印（如果标题文本为其他“”，则标题文本[0]）
对于标题的第_段中的段落：
打印（段落）

for title in tree.iter('title'):
         print(title.text)
         for paragraph in tree.iter('paragraph'):
             print(paragraph.text)