使用BeautifulSoup+;从列表中获取所有href标记和链接;python

使用BeautifulSoup+;从列表中获取所有href标记和链接;python,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我能够获得带有标签div及其内容的网页元素列表。它包含特定div中所有可用链接的列表 列表如下所示: # I formatted the list contents to look like an HTML code classroom_links = [<div class="main_class"> <div class="sub_class"> <a href="link1"

我能够获得带有标签
div
及其内容的网页元素列表。它包含特定
div
中所有可用链接的列表

列表如下所示:

# I formatted the list contents to look like an HTML code

classroom_links = 
[<div class="main_class">
    <div class="sub_class">
        <a href="link1" id="id_name"></a>
        <a href="link2" id="id_name"></a>
        <a href="link3" id="id_name"></a>
        <a href="link4" id="id_name"></a>
        <a href="link5" id="id_name"></a>
    </div>
</div>
]

classroomLinks = soup.find_all("div", {"class": "main_class"})
for links in classroomLinks:
    print(links.find('a')['href'])

但这只打印第一个链接。我无法打印所有剩余的链接。

您可以尝试迭代列表中每个元素的
a
标记:

for dom in classroom_links:
    for aTag in dom.find_all("a"):
        print(aTag)
完整示例:

from bs4 import BeautifulSoup
classroom_links = [BeautifulSoup("""<div class="main_class">
    <div class="sub_class">
        <a href="link1" id="id_name"></a>
        <a href="link2" id="id_name"></a>
        <a href="link3" id="id_name"></a>
        <a href="link4" id="id_name"></a>
        <a href="link5" id="id_name"></a>
    </div>
</div>""")]


for dom in classroom_links:
    for aTag in dom.find_all("a"):
        print(aTag)
# <a href="link1" id="id_name"></a>
# <a href="link2" id="id_name"></a>
# <a href="link3" id="id_name"></a>
# <a href="link4" id="id_name"></a>
# <a href="link5" id="id_name"></a>
从bs4导入美化组
教室链接=[BeautifulSoup(“”)
""")]
对于教室中的dom_链接:
对于dom中的aTag.find_all(“a”):
打印(aTag)
# 
# 
# 
# 
# 

您需要在
链接
循环中迭代
标记。@AlexandreB。你能详细说明一下吗?
from bs4 import BeautifulSoup
classroom_links = [BeautifulSoup("""<div class="main_class">
    <div class="sub_class">
        <a href="link1" id="id_name"></a>
        <a href="link2" id="id_name"></a>
        <a href="link3" id="id_name"></a>
        <a href="link4" id="id_name"></a>
        <a href="link5" id="id_name"></a>
    </div>
</div>""")]


for dom in classroom_links:
    for aTag in dom.find_all("a"):
        print(aTag)
# <a href="link1" id="id_name"></a>
# <a href="link2" id="id_name"></a>
# <a href="link3" id="id_name"></a>
# <a href="link4" id="id_name"></a>
# <a href="link5" id="id_name"></a>