使用BeautifulSoup+;从列表中获取所有href标记和链接;python
我能够获得带有标签使用BeautifulSoup+;从列表中获取所有href标记和链接;python,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我能够获得带有标签div及其内容的网页元素列表。它包含特定div中所有可用链接的列表 列表如下所示: # I formatted the list contents to look like an HTML code classroom_links = [<div class="main_class"> <div class="sub_class"> <a href="link1"
div
及其内容的网页元素列表。它包含特定div
中所有可用链接的列表
列表如下所示:
# I formatted the list contents to look like an HTML code
classroom_links =
[<div class="main_class">
<div class="sub_class">
<a href="link1" id="id_name"></a>
<a href="link2" id="id_name"></a>
<a href="link3" id="id_name"></a>
<a href="link4" id="id_name"></a>
<a href="link5" id="id_name"></a>
</div>
</div>
]
classroomLinks = soup.find_all("div", {"class": "main_class"})
for links in classroomLinks:
print(links.find('a')['href'])
但这只打印第一个链接。我无法打印所有剩余的链接。您可以尝试迭代列表中每个元素的
a
标记:
for dom in classroom_links:
for aTag in dom.find_all("a"):
print(aTag)
完整示例:
from bs4 import BeautifulSoup
classroom_links = [BeautifulSoup("""<div class="main_class">
<div class="sub_class">
<a href="link1" id="id_name"></a>
<a href="link2" id="id_name"></a>
<a href="link3" id="id_name"></a>
<a href="link4" id="id_name"></a>
<a href="link5" id="id_name"></a>
</div>
</div>""")]
for dom in classroom_links:
for aTag in dom.find_all("a"):
print(aTag)
# <a href="link1" id="id_name"></a>
# <a href="link2" id="id_name"></a>
# <a href="link3" id="id_name"></a>
# <a href="link4" id="id_name"></a>
# <a href="link5" id="id_name"></a>
从bs4导入美化组
教室链接=[BeautifulSoup(“”)
""")]
对于教室中的dom_链接:
对于dom中的aTag.find_all(“a”):
打印(aTag)
#
#
#
#
#
您需要在链接
循环中迭代
标记。@AlexandreB。你能详细说明一下吗?
from bs4 import BeautifulSoup
classroom_links = [BeautifulSoup("""<div class="main_class">
<div class="sub_class">
<a href="link1" id="id_name"></a>
<a href="link2" id="id_name"></a>
<a href="link3" id="id_name"></a>
<a href="link4" id="id_name"></a>
<a href="link5" id="id_name"></a>
</div>
</div>""")]
for dom in classroom_links:
for aTag in dom.find_all("a"):
print(aTag)
# <a href="link1" id="id_name"></a>
# <a href="link2" id="id_name"></a>
# <a href="link3" id="id_name"></a>
# <a href="link4" id="id_name"></a>
# <a href="link5" id="id_name"></a>