Python 如何从div中的a标记获取href
我使用的是beautifulsoup,我无法在a标记中提取href,无论我做什么,它都会向我返回错误。这是我正在使用的函数Python 如何从div中的a标记获取href,python,tags,beautifulsoup,Python,Tags,Beautifulsoup,我使用的是beautifulsoup,我无法在a标记中提取href,无论我做什么,它都会向我返回错误。这是我正在使用的函数 def scrape_a(url): r = requests.get(url) soup = BeautifulSoup(r.content) news = soup.find_all("div", attrs={"class": "news"}) return news html数据结构是 <div class="news"> <a
def scrape_a(url):
r = requests.get(url)
soup = BeautifulSoup(r.content)
news = soup.find_all("div", attrs={"class": "news"})
return news
html数据结构是
<div class="news">
<a href="www.link.com">
<h2 class="heading">
Kenyan police foil potential bomb attack in Nairobi mall
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>
我想从中提取的是href和h2 class='heading',每当我试图同时获取它们时,我都会得到一个错误none-type对象没有属性get-item这样的东西怎么样
from bs4 import BeautifulSoup
def get_news_class_hrefs(html):
"""
Finds all urls pointed to by all links inside
'news' class div elements
"""
soup = BeautifulSoup(html, 'html.parser')
links = [a['href'] for div in soup.find_all("div", attrs={"class": "news"}) for a in div.find_all('a')]
return links
# example html copied from question
html="""<div class="news">
<a href="www.link.com">
<h2 class="heading">
Kenyan police foil potential bomb attack in Nairobi mall
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>"""
get_news_class_hrefs(html)
# Output:
# [u'www.link.com']
从bs4导入美化组
def get_news_class_hrefs(html):
"""
查找内部所有链接指向的所有URL
“新闻”类div元素
"""
soup=BeautifulSoup(html,'html.parser')
links=[a['href']表示汤中的div.find_all(“div”,attrs={“class”:“news”})表示汤中的div.find_all('a')]
返回链接
#从问题复制的html示例
html=”“”
"""
获取新闻类hrefs(html)
#输出:
#[u'www.link.com']
返回错误:返回self.attrs[key]KeyError:'href'hmm。。。不能重现那个问题吗?你到底在跑什么?我将更新以显示我正在运行的完整代码。哦,如果您使用的是BeautifulSoup