Python 从beautifulsoup中的列表中选择链接_Python_Html_Beautifulsoup

Python 从beautifulsoup中的列表中选择链接

python html

Python 从beautifulsoup中的列表中选择链接,python,html,beautifulsoup,Python,Html,Beautifulsoup,我正在尝试从2000多个项目的列表中选择链接。最后，我希望能够按照列表中的链接打开下一页。我可以让Beauty soup打印我想要的li列表，但我不知道如何跟踪链接。在下面的代码末尾，我尝试添加以下内容： for link in RHAS: print(link.get('href')) 但我得到了这个错误： AttributeError:“NavigableString”对象没有属性“get” 我认为这与HTML仍然附加在代码中有关（即，当我打印li时，代码中会显示a、li和HREF

我正在尝试从2000多个项目的列表中选择链接。最后，我希望能够按照列表中的链接打开下一页。我可以让Beauty soup打印我想要的li列表，但我不知道如何跟踪链接。在下面的代码末尾，我尝试添加以下内容：

for link in RHAS:
    print(link.get('href'))

但我得到了这个错误：

AttributeError:“NavigableString”对象没有属性“get”

我认为这与HTML仍然附加在代码中有关（即，当我打印li时，代码中会显示a、li和HREF标记）。我如何让它跟随链接

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


# The website I am starting at
my_url = 'https://mars.nasa.gov/msl/multimedia/raw/'

#calls the urlopen function from the request module of the urllib module
#AKA opens up the connection and grabs the page
uClient = uReq(my_url)

#imports the webpage from html format into python.  
page_html = uClient.read()

#closes the client
uClient.close()

#parses the HTML using bs4
page_soup = soup(page_html, "lxml")

#finds the categories for the types of images on the site, category 1 is 
#RHAZ
containers = page_soup.findAll("div", {"class": "image_list"})

RHAZ = containers[1]  

# prints the li list that has the links I want
for child in RHAZ:
    print(child)

子节点中包含所有

div、ul、li、a

标记，这就是您得到错误的原因

如果要从所有锚定标记中获取href，请查找所有锚定标记并从中提取

href

，如下所示

for link in RHAZ.findAll('a'):
    print(link['href'])
    print(link['href'], link.text) # if you need both href and text

附言：你可以解释你正在处理的情况，然后显示你面临的错误，而不是在此之后陈述错误并解释你的情况。这会更清楚，你会很容易得到正确的回应

尝试将RHAZ中链接的

更改为RHAZ中链接的。你能把你的输出贴出来让我试试吗？啊，它成功了。我把它资本化了。知道如何跟踪链接吗？您可以构建一个函数来请求链接内容并对其进行解析。根据以下链接的内容结构，与您目前的内容非常相似。我应该结账，它可以大大地放松和加快你的爬行。