使用minidom的pythonxml解析_Python_Xml_Parsing

使用minidom的pythonxml解析

python xml parsing

使用minidom的pythonxml解析,python,xml,parsing,Python,Xml,Parsing,我刚刚开始学习如何使用minidom解析xml。我尝试使用以下代码获取作者的姓名（下面是xml数据）： from xml.dom import minidom xmldoc = minidom.parse("cora.xml") author = xmldoc.getElementsByTagName ('author') for author in author: authorID=author.getElementsByTagName('author id') prin

我刚刚开始学习如何使用

minidom

解析xml。我尝试使用以下代码获取作者的姓名（下面是xml数据）：

from xml.dom import minidom

xmldoc = minidom.parse("cora.xml")

author = xmldoc.getElementsByTagName ('author')

for author in author:
    authorID=author.getElementsByTagName('author id')
    print authorID

我一路上都得到了空括号（

[]

）。有人能帮我吗？我还需要标题和地点。提前谢谢。请参见下面的xml数据：

<?xml version="1.0" encoding="UTF-8"?>
<coraRADD>
   <publication id="ahlskog1994a">
      <author id="199">M. Ahlskog</author>
      <author id="74"> J. Paloheimo</author>
      <author id="64"> H. Stubb</author>
      <author id="103"> P. Dyreklev</author>
      <author id="54"> M. Fahlman</author>
      <title>Inganas</title>
      <title>and</title>
      <title>M.R.</title>
      <venue>
         <venue pubid="ahlskog1994a" id="1">
                  <name>Andersson</name>
                  <name> J Appl. Phys.</name>
                  <vol>76</vol>
                  <date> (1994). </date>
            </venue>


M.艾尔斯科格
帕洛赫莫
H.斯塔布
迪雷克列夫
法尔曼先生
英加纳斯
和
M.R。
安德森
J应用程序。物理。
76
(1994).

您只能找到带有

getElementsByTagName（）

的标记，而不能找到属性。您需要通过以下方式访问这些内容：

如果您仍在学习解析XML，那么您确实希望远离DOM。domapi过于冗长，无法适应许多不同的编程语言

这将更易于使用：

import xml.etree.ElementTree as ET

tree = ET.parse('cora.xml')
root = tree.getroot()

# loop over all publications
for pub in root.findall('publication'):
    print ' '.join([t.text for t in pub.findall('title')])
    for author in pub.findall('author'):
        print 'Author id: {}'.format(author.attrib['id'])
        print 'Author name: {}'.format(author.text)
    for venue in pub.findall('.//venue[@id]'):  # all venue tags with id attribute
        print ', '.join([name.text for name in venue.findall('name')])

只能找到带有

getElementsByTagName（）

的标记，而不能找到属性。您需要通过以下方式访问这些内容：

如果您仍在学习解析XML，那么您确实希望远离DOM。domapi过于冗长，无法适应许多不同的编程语言

这将更易于使用：

import xml.etree.ElementTree as ET

tree = ET.parse('cora.xml')
root = tree.getroot()

# loop over all publications
for pub in root.findall('publication'):
    print ' '.join([t.text for t in pub.findall('title')])
    for author in pub.findall('author'):
        print 'Author id: {}'.format(author.attrib['id'])
        print 'Author name: {}'.format(author.text)
    for venue in pub.findall('.//venue[@id]'):  # all venue tags with id attribute
        print ', '.join([name.text for name in venue.findall('name')])

这是正确的XML数据吗？有一个额外的开口

标记，

和

标记没有关闭。嗨，保罗，这是正确的XML数据。我直接从XML文件中复制了它。你嫁给minidom库了吗？例如，ElementTreeAPI更易于使用。我刚刚开始解析，因此我对其他API不太了解。如果ElementTree真的那么容易使用，我会试试它。谢谢。当我将XML保存到我的计算机并尝试使用minidom（

xmldoc=minidom.parse（“cora.XML”）

）解析它时，我得到一个

XML.parsers.expat.expat错误。也许我应该说“这是完整的XML数据吗？”这是正确的XML数据吗？有一个额外的开口
标记，
和
标记没有关闭。嗨，保罗，这是正确的XML数据。我直接从XML文件中复制了它。你嫁给minidom库了吗？例如，ElementTreeAPI更易于使用。我刚刚开始解析，因此我对其他API不太了解。如果ElementTree真的那么容易使用，我会试试它。谢谢。当我将XML保存到我的计算机并尝试使用minidom（xmldoc=minidom.parse（“cora.XML”）
）解析它时，我得到一个XML.parsers.expat.expat错误。也许我应该说“这是完整的XML数据吗？”嗨，彼得，它现在可以工作了。非常感谢，但我更感兴趣的是作者的名字和地点。有什么想法吗？@user2274879：先在出版物上循环（对于root.findall中的pub.findall（'publication'）：
），然后从那里找到作者（对于pub.findall中的author（'author'）
）和场馆（对于pub.findall中的场馆（'.//场馆[@id'））
，也许可以找到那些具有id
属性的作者）。作者姓名是标记中的文本内容，因此Author.text
将为您提供该内容。我在尝试使用Author.tex时遇到以下错误：TypeError:“str”对象不可用callable@user2274879：无（）
.text
是一个属性（是的，我一开始犯了这个错误，后来就改正了）。嗨，彼得，它的工作原理很神奇。我在发帖之前真的被欺骗了。嗨，皮特斯，现在开始工作了。非常感谢，但我更感兴趣的是作者的名字和地点。有什么想法吗？@user2274879：先在出版物上循环（对于root.findall中的pub.findall（'publication'）：
），然后从那里找到作者（对于pub.findall中的author（'author'）
）和场馆（对于pub.findall中的场馆（'.//场馆[@id'））
，也许可以找到那些具有id
属性的作者）。作者姓名是标记中的文本内容，因此Author.text
将为您提供该内容。我在尝试使用Author.tex时遇到以下错误：TypeError:“str”对象不可用callable@user2274879：无（）
.text
是一个属性（是的，我一开始犯了这个错误，后来就改正了）。嗨，彼得，它的工作原理很神奇。在发布问题之前，我实际上是受骗了。