Python 使用BeautifulSoup或minidom解析XML_Python_Xml Parsing_Beautifulsoup_Minidom

Python 使用BeautifulSoup或minidom解析XML

python

Python 使用BeautifulSoup或minidom解析XML,python,xml-parsing,beautifulsoup,minidom,Python,Xml Parsing,Beautifulsoup,Minidom,我有类似这样的XML #filename sample.xml <tag> <tag1> <tag2 property="something"/> <tag2 property="something1"/> <tag2 property="something2">value</tag2> <tag2 property="something3"> <tag3> <tag4 data="data1"

我有类似这样的XML

#filename sample.xml
<tag>
<tag1>
<tag2 property="something"/>
<tag2 property="something1"/>
<tag2 property="something2">value</tag2>
<tag2 property="something3">
<tag3>
<tag4 data="data1"/>
<tag4 data="data2"/>
</tag3>
</tag2>
</tag1>
</tag>

但它抛出了一个错误：

AttributeError: 'NoneType' object has no attribute 'tag4'

由于多个

tag2

s，打印功能失败。解决方案是使用

.findAll（'tag2'）

检索所有标记

以下是一个工作示例：

#! /usr/bin/python

from bs4 import BeautifulSoup
f=open('sample.xml')
fdata=f.read()
xmldata=BeautifulSoup(fdata)

alltags2 = xmldata.tag.tag1.findAll('tag2')

for tag2 in alltags2:
    alltags3 = tag2.findAll('tag3')
    for tag3 in alltags3:
        alltags4 = tag3.findAll('tag4')
        for tag4 in alltags4:
            print "The data I got was :\"%s\"" % (tag4["data"])

您好，

由于多个

tag2

s，打印功能失败。解决方案是使用

.findAll（'tag2'）

检索所有标记

以下是一个工作示例：

#! /usr/bin/python

from bs4 import BeautifulSoup
f=open('sample.xml')
fdata=f.read()
xmldata=BeautifulSoup(fdata)

alltags2 = xmldata.tag.tag1.findAll('tag2')

for tag2 in alltags2:
    alltags3 = tag2.findAll('tag3')
    for tag3 in alltags3:
        alltags4 = tag3.findAll('tag4')
        for tag4 in alltags4:
            print "The data I got was :\"%s\"" % (tag4["data"])

一种可能的方法是使用

select（）

方法将CSS选择器语句作为参数传递。例如，如果您确实希望严格选择具有此类祖先层次结构的

：

.....
xmldata=BeautifulSoup(fadata)
for tag4 in xmldata.select("tag > tag1 > tag2 > tag3 > tag4"):
    print tag4["data"]

上面将打印以下内容：

data1
data2

或者，如果您只需要XML中所有的

元素，您可以简单地使用

xmldata.select（“tag4”）

一种可能的方法是使用

select（）

方法将CSS选择器语句作为参数传递。例如，如果您确实希望严格选择具有此类祖先层次结构的

：

.....
xmldata=BeautifulSoup(fadata)
for tag4 in xmldata.select("tag > tag1 > tag2 > tag3 > tag4"):
    print tag4["data"]

上面将打印以下内容：

data1
data2

或者，如果您只需要所有

元素，只要使用

xmldata即可。选择（“tag4”）

BeautifulStoneSoup属于过时的beautifulsoup3；你应该使用BeautifulSoup4/

bs4

。BS3不能正确解析XML，而BS4可以。@AnttiHaapala使用BS4。还有，如何使用xml.dom.minidom实现同样的效果？BeautifulStoneSoup属于过时的Beautifulsoup3；你应该使用BeautifulSoup4/

bs4

。BS3不能正确解析XML，而BS4可以。@AnttiHaapala使用BS4。还有，如何使用xml.dom.minidom实现同样的功能？如果我想获得标记4的父节点，那么应该怎么做？我尝试了

x=xmldata。为x:print（node.parentNode）中的节点选择（'tag4'）

您可以尝试：

node.parent

而不是

node.parentNode

如果我想获得标记4的父节点，那么应该怎么做？我尝试了

x=xmldata。在x:print（node.parentNode）中为节点选择（'tag4'）

您可以尝试：

node.parent

而不是

node.parentNode