无法在python中从html页面提取文本_Python_Beautifulsoup_Html Parsing

无法在python中从html页面提取文本

python

无法在python中从html页面提取文本,python,beautifulsoup,html-parsing,Python,Beautifulsoup,Html Parsing,我对网页抓取很陌生。我读到了关于BeautifulSoup的文章，并尝试使用它。但我无法提取具有给定类名“company desc and sort container”的文本。我甚至不能从html页面中提取标题。这是我尝试的代码： from BeautifulSoup import BeautifulSoup import requests url= 'http://fortune.com/best-companies/' r = requests.get(url) soup =

我对网页抓取很陌生。我读到了关于BeautifulSoup的文章，并尝试使用它。但我无法提取具有给定类名“company desc and sort container”的文本。我甚至不能从html页面中提取标题。这是我尝试的代码：

from BeautifulSoup import BeautifulSoup
import requests

url= 'http://fortune.com/best-companies/'    
r = requests.get(url)

soup = BeautifulSoup(r.text)

#print soup.prettify()[0:1000]
print soup.find_all("title")

letters = soup.find_all("div", class_="company-desc-and-sort-container")

我得到以下错误：

 print soup.find_all("title")
TypeError: 'NoneType' object is not callable

您使用的是
BeautifulSoup
version 3，它不仅不再维护，而且也没有

find_all（）

方法。而且，由于点表示法用作

find（）

的快捷方式，

BeautifulSoup

尝试查找带有“find_all”标记名的元素，结果是

None

。然后，它将执行

None（“title”）

，结果是：

TypeError:“非类型”对象不可调用

升级至

BeautifulSoup

version 4，更换：

from BeautifulSoup import BeautifulSoup

与：

确保安装了

beautifulsoup4

软件包：

pip install --upgrade beautifulsoup4

未找到标题标记并返回“无”。此外，“find_all”方法将返回一个列表，如果它确实找到了一些东西，您将得到一个不同的错误。您不能打印列表。只使用“查找”方法。这将完成第一个标题标记

那么html页面甚至有一个标题标签吗？搜索，如果没有则仅打印

你最美的版本是什么？

pip install --upgrade beautifulsoup4

soup.find_all("title")