Python BeautifulSoup找不到标记_Python_Web Scraping_Beautifulsoup

Python BeautifulSoup找不到标记

python web-scraping

Python BeautifulSoup找不到标记,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试着去刮，其他的页面都喜欢。我一直在使用BeautifulSoup（也尝试过lxml，但存在安装问题）。我正在使用以下代码： value = "http://www.presidency.ucsb.edu/ws/index.php?pid=99556" desiredTag = "span" r = urllib2.urlopen(value) data = BeautifulSoup(r.read(), 'html5lib') displayText = data.find_all(des

我试着去刮，其他的页面都喜欢。我一直在使用BeautifulSoup（也尝试过lxml，但存在安装问题）。我正在使用以下代码：

value = "http://www.presidency.ucsb.edu/ws/index.php?pid=99556"
desiredTag = "span"
r = urllib2.urlopen(value)
data = BeautifulSoup(r.read(), 'html5lib') 
displayText = data.find_all(desiredTag)
print displayText
displayText = " ".join(str(displayText))
displayText = BeautifulSoup(displayText, 'html5lib')

出于某种原因，这并没有收回

，而且我也尝试了

desiredTag

作为

我遗漏了什么吗？

您肯定正在体验

BeautifulSoup

所使用的差异

html.parser

和

lxml

为我工作：

data = BeautifulSoup(urllib2.urlopen(value), 'html.parser')

证明：

>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> 
>>> url = "http://www.presidency.ucsb.edu/ws/index.php?pid=99556"
>>> 
>>> data = BeautifulSoup(urllib2.urlopen(url), 'html.parser')
>>> data.find("span", class_="displaytext").text
u'PARTICIPANTS:Former Speaker of the House Newt Gingrich (GA);
...

这是一个很好的、非常彻底的回答。我在环顾四周时看到了这一点，但我得到了一个“HTMLParser.htmlparserror:格式错误的开始标记，在第1183行第15列”，在该站点上正好有这段代码。可能是我安装了不正确的东西吗？@InquisitiveDiot好的，快速检查：您使用的是哪个Python版本？感谢Active Python的.2.7.2。这是一条蟒蛇install@inquisitiveIdiot如果这没什么大不了的，你能试着至少升级到2.7.6吗（2.7.9会更好）？作为将来的参考，我很确定python升级很重要