Python BeautifulSoup如何用于循环和提取特定数据？_Python_Beautifulsoup

Python BeautifulSoup如何用于循环和提取特定数据？

python

Python BeautifulSoup如何用于循环和提取特定数据？,python,beautifulsoup,Python,Beautifulsoup,下面的HTML代码来自一个关于电影评论的网站。我想从下面的代码中提取星星，它们是John C.Reilly，Sarah Silverman和Gal Gadot。我怎么能这样做代码： html_doc=”“” 星星： , , | &拉阔； """ 从bs4导入BeautifulSoup soup=BeautifulSoup（html_doc，'html.parser'）我的想法我打算使用for循环遍历每个div类，直到找到带有文本Stars的类，然后在其中提取名称。但是我不知道如何编写代码

下面的HTML代码来自一个关于电影评论的网站。我想从下面的代码中提取星星，它们是

John C.Reilly

，

Sarah Silverman

和

Gal Gadot

。我怎么能这样做

代码：

html_doc=”“”
星星：
,
,
|
&拉阔；
"""
从bs4导入BeautifulSoup
soup=BeautifulSoup（html_doc，'html.parser'）

我的想法

我打算使用for循环遍历每个

div类

，直到找到带有文本

Stars

的类，然后在其中提取名称。但是我不知道如何编写代码，因为我对HTML语法和模块都不太熟悉。

我将演示如何实现这一点，并且您只需要学习漂亮的SOAP语法

首先，我们希望对属性为“class”的“div”标记使用该方法

findAll

然后，我们将过滤其中没有星星的所有div：

stars = [div for div in divs if "Stars:" in div.h4.text]

如果您只有一个开始的地方，您可以将其取出：

star = start[0]

然后再次查找标记“a”中的所有文本

您可以看到，我没有使用任何html/css语法，只使用了soup。

我希望这会有所帮助。

您可以在

credit\u summary\u项目中迭代所有a
标记

div

：

from bs4 import BeautifulSoup as soup
*results, _ = [i.text for i in soup(html_doc, 'html.parser').find('div', {'class':'credit_summary_item'}).find_all('a')]

输出：

['John C. Reilly', 'Sarah Silverman', 'Gal Gadot']

['John C. Reilly', 'Sarah Silverman', 'Gal Gadot']

编辑：

输出：

['John C. Reilly', 'Sarah Silverman', 'Gal Gadot']

['John C. Reilly', 'Sarah Silverman', 'Gal Gadot']

你也可以使用

有多个

div

与class

credit\u summary\u item

@Newbie101有演员姓名的

div

是否总是出现在HTML中的第一位？如果是这样，那么这个解决方案仍然有效。我收到一条消息

AttributeError:ResultSet对象没有属性“find\u all”

？@Newbie101哪一行抛出了这个错误？这个代码对我来说很好用<代码>结果集如果试图直接对

查找所有

的结果调用方法，而不是对其进行迭代，通常会出现错误。@Newbie101在我编辑的部分中，我添加了一个更健壮的解决方案。

_d = [i for i in soup(html_doc, 'html.parser').find_all('div', {'class':'credit_summary_item'}) if 'Stars:' in i.text][0]
*results, _ = [i.text for i in _d.find_all('a')]

['John C. Reilly', 'Sarah Silverman', 'Gal Gadot']

stars = soup.findAll('a', href=re.compile('/name/nm.+'))
names = [x.text for x in stars]
names

# output: ['John C. Reilly', 'Sarah Silverman', 'Gal Gadot']