pythonxml：解析属性_Python_Xml

pythonxml：解析属性

python xml

pythonxml：解析属性,python,xml,Python,Xml,我有一份xml格式的报纸，我正在尝试解析特定的部分我的XML如下所示： <?xml version="1.0" encoding="UTF-8"?> <articles> <text> <text.cr> <pg pgref="1" clipref="1" pos="0,0,2275,3149"/>

我有一份xml格式的报纸，我正在尝试解析特定的部分

我的XML如下所示：

<?xml version="1.0" encoding="UTF-8"?>
<articles>
   <text>
      <text.cr>
         <pg pgref="1" clipref="1" pos="0,0,2275,3149"/>
         <p type="none">
            <wd pos="0,0,0,0"/>
         </p>
      </text.cr>
      <text.cr>
         <pg pgref="1" clipref="2" pos="0,0,2275,3149"/>
         <p type="none">
            <wd pos="0,0,0,0"/>
         </p>
      </text.cr>
      <text.cr>
         <pg pgref="1" clipref="3" pos="4,32,1078,454"/>
         <p type="none">
            <wd pos="4,32,1078,324">The</wd>
            <wd pos="12,234,1078,450">Newspaper</wd>
         </p>
      </text.cr>

我设法解析了根和属性，但我不知道如何处理

'wd'

感谢您的帮助

将循环更改为

for x in tree:
  x_ = x.findall('.//wd')
  for t in x_:
      if t.text is not None:
          print(t.text)

输出：

The
Newspaper

下面

The
Newspaper

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<articles>
   <text>
      <text.cr>
         <pg pgref="1" clipref="1" pos="0,0,2275,3149"/>
         <p type="none">
            <wd pos="0,0,0,0"/>
         </p>
      </text.cr>
      <text.cr>
         <pg pgref="1" clipref="2" pos="0,0,2275,3149"/>
         <p type="none">
            <wd pos="0,0,0,0"/>
         </p>
      </text.cr>
      <text.cr>
         <pg pgref="1" clipref="3" pos="4,32,1078,454"/>
         <p type="none">
            <wd pos="4,32,1078,324">The</wd>
            <wd pos="12,234,1078,450">Newspaper</wd>
         </p>
      </text.cr></text></articles>'''

values = ['The', 'Newspaper']
root = ET.fromstring(xml)
wds = [wd for wd in root.findall('.//wd') if wd.text in values]
for wd in wds:
    print(wd.attrib['pos'])

4,32,1078,324
12,234,1078,450