Python 用cElementTree解析XML_Python_Xml_Xpath_Celementtree

Python 用cElementTree解析XML

python xml xpath

Python 用cElementTree解析XML,python,xml,xpath,celementtree,Python,Xml,Xpath,Celementtree,我一直在为Python重新编写一些旧的XML解析代码，我偶然发现了cElementTree的乐趣，我喜欢它，因为我可以在这么少的行中完成这么多工作我在xpath方面的经验没有那么丰富，这个问题更多的是关于进一步深入结构我在test.xml <?xml version="1.0"?> <ownershipDocument> <issue> <ic>0000030305</ic> &l

我一直在为Python重新编写一些旧的XML解析代码，我偶然发现了

cElementTree

的乐趣，我喜欢它，因为我可以在这么少的行中完成这么多工作

我在

xpath

方面的经验没有那么丰富，这个问题更多的是关于进一步深入结构

我在

test.xml

<?xml version="1.0"?>
   <ownershipDocument>
     <issue>
         <ic>0000030305</ic>
         <iname>DUCOMM</iname>
         <its>DCP</its>
     </issue>
     <ndt>
         <ndtran>
             <tc>
                 <tft>4</tft>
                 <tc>P</tc>
                 <esi>0</esi>
             </tc>
         </ndtran>
         <ndtran>
             <tc>
                 <tft>4</tft>
                 <tc>P</tc>
                 <esi>0</esi>
             </tc>
          </ndtran>
     </ndt>
 </ownershipDocument>

这给了我：

ownershipDocument
{}
('issue', {})
('ndt', {})
('0000030305', 'DUCOMM')

这成功地让我在“问题”中获得了所需的信息

问题是我需要访问多个“ndtran”节点（在“ndt”节点中）。在解析时，我可以将“tft”、“tc”和“esi”值提取为组，但我需要迭代每个“tc”节点，提取“tft”、“tc”、“esi”值，将它们插入数据库，然后移动到下一个“tc”节点并再次执行

我尝试使用以下方法对每一项进行迭代：

for tc in root.findall("./ndt/ndtran/tc"):
    tft = tc.find('tft').text
    tc = tc.find('tc').text
    esi = tc.find('esi').text
    print(tft,tc,esi)

这几乎让我达到了目的（我想），但它确实给了我一个错误

esi = tc.find('esi').text
AttributeError: 'int' object has no attribute 'text'

我希望这是有道理的。我相信我所追求的是DOM解析方法，这很好，因为这些文档没有那么大

非常感谢您给我的建议或正确的指示。

您在前一行中将

tc

属性的值替换为

string

：

for tc in root.findall("./ndt/ndtran/tc"):
    tft = tc.find('tft').text
    tc = tc.find('tc').text
   #^^ use different variable name here
    esi = tc.find('esi').text
         #^^ at this point, `tc` is no longer referencing the outer <tc> elements

用于根目录中的tc.findall（“./ndt/ndtran/tc”）：
tft=tc.find（'tft'）.text
tc=tc.find（'tc'）。文本
#^^在这里使用不同的变量名
esi=tc.find（'esi'）.text
#^^此时，`tc`不再指外部元素

有趣的巧合是，

string

还有一个方法，当找不到关键字时，该方法返回

int

（

-1

），因此“int”对象没有属性“text”错误。

尝试更改

tc

中的子项，或者更改root.findall中与tc iter

不同的任何名称(“/ndt/ndtran/tc”）

for tc in root.findall("./ndt/ndtran/tc"):
    tft = tc.find('tft').text
    tc = tc.find('tc').text
   #^^ use different variable name here
    esi = tc.find('esi').text
         #^^ at this point, `tc` is no longer referencing the outer <tc> elements