Python 如何在BeautifulSoup对象中查找文本对象的数量_Python_Html_Web Scraping_Beautifulsoup

Python 如何在BeautifulSoup对象中查找文本对象的数量

python html web-scraping

Python 如何在BeautifulSoup对象中查找文本对象的数量,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我正在用python中的BeautifulSoup在维基百科上抓取一个页面，我想知道是否有人知道HTML对象中文本对象的数量。例如，以下代码获取以下HTML： soup.find_all(class_ = 'toctext') <span class="toctext">Actors and actresses</span>, <span class="toctext">Archaeologists and anthropologists</span&

我正在用python中的BeautifulSoup在维基百科上抓取一个页面，我想知道是否有人知道HTML对象中文本对象的数量。例如，以下代码获取以下HTML：

soup.find_all(class_ = 'toctext')

<span class="toctext">Actors and actresses</span>, <span class="toctext">Archaeologists and anthropologists</span>, <span class="toctext">Architects</span>, <span class="toctext">Artists</span>, <span class="toctext">Broadcasters</span>, <span class="toctext">Businessmen</span>, <span class="toctext">Chefs</span>, <span class="toctext">Clergy</span>, <span class="toctext">Criminals</span>, <span class="toctext">Conspirators</span>, <span class="toctext">Economists</span>, <span class="toctext">Engineers</span>, <span class="toctext">Explorers</span>, <span class="toctext">Filmmakers</span>, <span class="toctext">Historians</span>, <span class="toctext">Humourists</span>, <span class="toctext">Inventors / engineers</span>, <span class="toctext">Journalists / newsreaders</span>, <span class="toctext">Military: soldiers/sailors/airmen</span>, <span class="toctext">Monarchs</span>, <span class="toctext">Musicians</span>, <span class="toctext">Philosophers</span>, <span class="toctext">Photographers</span>, <span class="toctext">Politicians</span>, <span class="toctext">Scientists</span>, <span class="toctext">Sportsmen and sportswomen</span>, <span class="toctext">Writers</span>, <span class="toctext">Other notables</span>, <span class="toctext">English expatriates</span>, <span class="toctext">References</span>, <span class="toctext">See also</span>

我的目标是获取并存储列表中的所有文本对象。我通过使用for循环来实现这一点，但是我不知道html块中有多少文本对象。当然，如果我得到一个不存在的索引，我会遇到一个错误。有其他方法吗？

您可以使用

for…in

循环

In [13]: [t.text for t in soup.find_all(class_ = 'toctext')]
Out[13]: 
['Actors and actresses',
 'Archaeologists and anthropologists',
 'Architects',
 'Artists',
 'Broadcasters',
 'Businessmen',
 'Chefs',
 'Clergy',
 'Criminals',
 'Conspirators',
 'Economists',
 'Engineers',
 'Explorers',
 'Filmmakers',
 'Historians',
 'Humourists',
 'Inventors / engineers',
 'Journalists / newsreaders',
 'Military: soldiers/sailors/airmen',
 'Monarchs',
 'Musicians',
 'Philosophers',
 'Photographers',
 'Politicians',
 'Scientists',
 'Sportsmen and sportswomen',
 'Writers',
 'Other notables',
 'English expatriates',
 'References',
 'See also']

您可以在循环中为…使用


In [13]: [t.text for t in soup.find_all(class_ = 'toctext')]
Out[13]: 
['Actors and actresses',
 'Archaeologists and anthropologists',
 'Architects',
 'Artists',
 'Broadcasters',
 'Businessmen',
 'Chefs',
 'Clergy',
 'Criminals',
 'Conspirators',
 'Economists',
 'Engineers',
 'Explorers',
 'Filmmakers',
 'Historians',
 'Humourists',
 'Inventors / engineers',
 'Journalists / newsreaders',
 'Military: soldiers/sailors/airmen',
 'Monarchs',
 'Musicians',
 'Philosophers',
 'Photographers',
 'Politicians',
 'Scientists',
 'Sportsmen and sportswomen',
 'Writers',
 'Other notables',
 'English expatriates',
 'References',
 'See also']

您可以在

循环中为…使用


In [13]: [t.text for t in soup.find_all(class_ = 'toctext')]
Out[13]: 
['Actors and actresses',
 'Archaeologists and anthropologists',
 'Architects',
 'Artists',
 'Broadcasters',
 'Businessmen',
 'Chefs',
 'Clergy',
 'Criminals',
 'Conspirators',
 'Economists',
 'Engineers',
 'Explorers',
 'Filmmakers',
 'Historians',
 'Humourists',
 'Inventors / engineers',
 'Journalists / newsreaders',
 'Military: soldiers/sailors/airmen',
 'Monarchs',
 'Musicians',
 'Philosophers',
 'Photographers',
 'Politicians',
 'Scientists',
 'Sportsmen and sportswomen',
 'Writers',
 'Other notables',
 'English expatriates',
 'References',
 'See also']

您可以在

循环中为…使用


In [13]: [t.text for t in soup.find_all(class_ = 'toctext')]
Out[13]: 
['Actors and actresses',
 'Archaeologists and anthropologists',
 'Architects',
 'Artists',
 'Broadcasters',
 'Businessmen',
 'Chefs',
 'Clergy',
 'Criminals',
 'Conspirators',
 'Economists',
 'Engineers',
 'Explorers',
 'Filmmakers',
 'Historians',
 'Humourists',
 'Inventors / engineers',
 'Journalists / newsreaders',
 'Military: soldiers/sailors/airmen',
 'Monarchs',
 'Musicians',
 'Philosophers',
 'Photographers',
 'Politicians',
 'Scientists',
 'Sportsmen and sportswomen',
 'Writers',
 'Other notables',
 'English expatriates',
 'References',
 'See also']

请尝试以下代码：
for txt in soup.find_all(class_ = 'toctext'):
    print(txt.text)

请尝试以下代码：
for txt in soup.find_all(class_ = 'toctext'):
    print(txt.text)

请尝试以下代码：
for txt in soup.find_all(class_ = 'toctext'):
    print(txt.text)

请尝试以下代码：
for txt in soup.find_all(class_ = 'toctext'):
    print(txt.text)