Python 美化组XML解析不起作用
我试图用Python 美化组XML解析不起作用,python,xml,beautifulsoup,urllib2,lxml,Python,Xml,Beautifulsoup,Urllib2,Lxml,我试图用BeautifulSoup解析一个XML页面,但由于某种原因,它找不到XML解析器。我不认为这是一个路径问题,因为我过去使用lxml解析页面,而不是XML。代码如下: from bs4 import * import urllib2 import lxml from lxml import * BASE_URL = "http://auctionresults.fcc.gov/Auction_66/Results/xml/round/66_115_database_round.xml
BeautifulSoup
解析一个XML页面,但由于某种原因,它找不到XML解析器。我不认为这是一个路径问题,因为我过去使用lxml
解析页面,而不是XML。代码如下:
from bs4 import *
import urllib2
import lxml
from lxml import *
BASE_URL = "http://auctionresults.fcc.gov/Auction_66/Results/xml/round/66_115_database_round.xml"
proxy = urllib2.ProxyHandler({'http':'http://myProxy.com})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
page = urllib2.urlopen(BASE_URL)
soup = BeautifulSoup(page,"xml")
print soup
我可能遗漏了一些简单的东西,但是我在这里找到的所有关于BS的XML解析问题都是关于bs3的,我使用的是bs4,它使用一种不同的方法来解析XML。谢谢。如果您安装了
lxml
,只需将其称为BeautifulSoup
的解析器,如下所示
代码:
from bs4 import BeautifulSoup as bsoup
import requests as rq
url = "http://auctionresults.fcc.gov/Auction_66/Results/xml/round/66_115_database_round.xml"
r = rq.get(url)
soup = bsoup(r.content, "lxml")
print soup
<html><body><dataroot xmlns:od="urn:schemas-microsoft-com:officedata" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:nonamespaceschemalocation="66_database.xsd"><all_bids>
<auction_id>66</auction_id>
<auction_description>Advanced Wireless Services</auction_description>
... really long list follows...
[Finished in 34.9s]
结果:
from bs4 import BeautifulSoup as bsoup
import requests as rq
url = "http://auctionresults.fcc.gov/Auction_66/Results/xml/round/66_115_database_round.xml"
r = rq.get(url)
soup = bsoup(r.content, "lxml")
print soup
<html><body><dataroot xmlns:od="urn:schemas-microsoft-com:officedata" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:nonamespaceschemalocation="66_database.xsd"><all_bids>
<auction_id>66</auction_id>
<auction_description>Advanced Wireless Services</auction_description>
... really long list follows...
[Finished in 34.9s]
66
高级无线服务
... 下面是一个很长的列表。。。
[以34.9秒完成]
让我们知道这是否有帮助。这一定是一个路径问题,因为它在iPython中工作,但在Eclipse PyDev中不工作。我会解决的。谢谢你的提示,这有点奇怪。不过,很高兴知道你找到了问题的根源。在任何情况下,上述答案都可以作为一种解决办法