Python 如何在Beautiful Soup中仅获取id以某个值结尾的div？_Python_Web Scraping_Beautifulsoup

Python 如何在Beautiful Soup中仅获取id以某个值结尾的div？

python web-scraping

Python 如何在Beautiful Soup中仅获取id以某个值结尾的div？,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我有一个网页的来源，其中有这么多的div与各自的id 例如： <div id="abc_answer">Some content</div> <div id="abcd_answer">Some content</div> <div id="ggg">Some Content</div> 一些内容一些内容一些内容我只想提取在所有给定div的id中都有“\u answer”子字符串的信息。我想用beautifulso

我有一个网页的来源，其中有这么多的div与各自的id

例如：

<div id="abc_answer">Some content</div>
<div id="abcd_answer">Some content</div>
<div id="ggg">Some Content</div>

一些内容
一些内容
一些内容

我只想提取在所有给定div的id中都有“\u answer”子字符串的信息。我想用beautifulsoup实现这一点。一个选项是使用并传入

[id$=\u answer]

，它将选择以子字符串

\u answer

结尾的

id

属性值的元素：

soup.select('div[id$=_answer]')

输出：

> [<div id="abc_answer">Some answer</div>, <div id="abcd_answer">Some answer</div>]

[一些答案，一些答案]

您可以将一个函数传递给

find\u all

，该函数可以执行任何类型的检查：

soup.find_all(lambda tag:    tag.name=='div' \
                         and tag.has_attr('id') \
                         and tag['id'].endswith("_answer")))
#[<div id="abc_answer">Some content</div>, 
# <div id="abcd_answer">Some content</div>]

soup.find_all（lambda标记：tag.name=='div'\
和tag.has_attr（'id'））\
和标记['id'].endswith（“_-answer”））
#[一些内容，
#一些内容]

查看其值之前，请确保检查

id

是否存在。

以下是解决方案：

bsObj = BeautifulSoup(some.text, "html.parser");
found = bsObj.findAll("div", id=lambda x: x and x.endswith('_answer'))