Python 我试图刮一个网站的链接，也刮内已经刮链接的链接_Python_Beautifulsoup_Python Requests Html

Python 我试图刮一个网站的链接，也刮内已经刮链接的链接

python

Python 我试图刮一个网站的链接，也刮内已经刮链接的链接,python,beautifulsoup,python-requests-html,Python,Beautifulsoup,Python Requests Html,我正在尝试刮取一个网站的链接，刮取后，我还想看看我刮取的链接是否只是文章或包含更多链接，如果是，我也想刮取这些链接。我正在尝试使用BeautifulSoup 4实现它，这是我迄今为止的代码： import requests from bs4 import BeautifulSoup url ='https://www.lbbusinessjournal.com/' try: r = requests.get(url, headers={'User-Agent': user_agent})

我正在尝试刮取一个网站的链接，刮取后，我还想看看我刮取的链接是否只是文章或包含更多链接，如果是，我也想刮取这些链接。我正在尝试使用BeautifulSoup 4实现它，这是我迄今为止的代码：

import requests from bs4 import BeautifulSoup url ='https://www.lbbusinessjournal.com/' try: r = requests.get(url, headers={'User-Agent': user_agent}) soup = BeautifulSoup(r.text, 'html.parser') for post in soup.find_all(['h3', 'li'], class_=['entry-title td-module-title', 'menu-item']): link = post.find('a').get('href') print(link) r = requests.get(link, headers={'User-Agent': user_agent}) soup1 = BeautifulSoup(r.text, 'html.parser') for post1 in soup1.find_all('h3', class_='entry-title td-module-title'): link1 = post1.find('a').get('href') print(link1) except Exception as e: print(e)

我想要页面上的链接，刮取我从该页面获得的链接中可能存在的链接，例如，我也想要页面内的链接。到目前为止，我只从主页上获取链接。
尝试从
except
子句中获取
raisee
，您将看到错误
AttributeError:“非类型”对象没有属性“get”
源于行
link1=post1.find（'a'）.get（'href'）
，其中
post1.find（'a'）
返回
None
——这是因为您检索的HTML
h3
元素中至少有一个没有
a
元素——事实上，该链接似乎在HTML中被注释掉了
相反，您应该将此
post1.find（'a'）.get（'href'）
调用分为两个步骤，并在尝试获取
'href'
属性之前检查
post1.find（'a'）
返回的元素是否为
None
，即：

for post1 in soup1.find_all('h3', class_='entry-title td-module-title'): element = post1.find('a') if element is not None: link1 = element.get('href') print(link1)
使用此更改运行代码的输出：

https://www.lbbusinessjournal.com/ https://www.lbbusinessjournal.com/this-virus-doesnt-have-borders-port-official-warns-of-pandemics-future-economic-impact/ https://www.lbbusinessjournal.com/pharmacy-and-grocery-store-workers-call-for-increased-protections-against-covid-19/ https://www.lbbusinessjournal.com/up-close-and-personal-grooming-businesses-struggle-in-times-of-social-distancing/ https://www.lbbusinessjournal.com/light-at-the-end-of-the-tunnel-long-beach-secures-contract-for-new-major-convention/ https://www.lbbusinessjournal.com/hospitals-prepare-for-influx-of-coronavirus-patients-officials-worry-it-wont-be-enough/ https://www.lbbusinessjournal.com/portside-keeping-up-with-the-port-of-long-beach-18/ https://www.lbbusinessjournal.com/news/ ...

那么，当您运行代码时，当前输出是什么？它没有找到页面上的链接吗？它没有按照我的要求提供链接内的链接。@dspencer如果我打印post1，那么我可以看到链接内的链接，但是如果我尝试在元素内获得“href”属性，它就不会打印。仅供参考，它是scrap（和scraping，scraped，scraper）而不是scrape