Python 如何从div中的a标记获取href_Python_Tags_Beautifulsoup

Python 如何从div中的a标记获取href

python tags

Python 如何从div中的a标记获取href,python,tags,beautifulsoup,Python,Tags,Beautifulsoup,我使用的是beautifulsoup，我无法在a标记中提取href，无论我做什么，它都会向我返回错误。这是我正在使用的函数 def scrape_a(url): r = requests.get(url) soup = BeautifulSoup(r.content) news = soup.find_all("div", attrs={"class": "news"}) return news html数据结构是 <div class="news"> <a

我使用的是beautifulsoup，我无法在a标记中提取href，无论我做什么，它都会向我返回错误。这是我正在使用的函数

def scrape_a(url):
  r = requests.get(url)
  soup = BeautifulSoup(r.content)
  news =  soup.find_all("div", attrs={"class": "news"})
  return news

html数据结构是

<div class="news">
<a href="www.link.com">
<h2 class="heading">
Kenyan police foil potential bomb attack in Nairobi mall 
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>

我想从中提取的是href和h2 class='heading'，每当我试图同时获取它们时，我都会得到一个错误none-type对象没有属性get-item

这样的东西怎么样

from bs4 import BeautifulSoup

def get_news_class_hrefs(html):
    """
    Finds all urls pointed to by all links inside
    'news' class div elements
    """
    soup = BeautifulSoup(html, 'html.parser')
    links = [a['href'] for div in soup.find_all("div", attrs={"class": "news"}) for a in div.find_all('a')]
    return links

# example html copied from question
html="""<div class="news">
<a href="www.link.com">
<h2 class="heading">
Kenyan police foil potential bomb attack in Nairobi mall 
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>"""

get_news_class_hrefs(html)
# Output:
# [u'www.link.com']

从bs4导入美化组
def get_news_class_hrefs（html）：
"""
查找内部所有链接指向的所有URL
“新闻”类div元素
"""
soup=BeautifulSoup（html，'html.parser'）
links=[a['href']表示汤中的div.find_all（“div”，attrs={“class”：“news”}）表示汤中的div.find_all（'a'）]
返回链接
#从问题复制的html示例
html=”“”
"""
获取新闻类hrefs（html）
#输出：
#[u'www.link.com']

返回错误：返回self.attrs[key]KeyError:'href'hmm。。。不能重现那个问题吗？你到底在跑什么？我将更新以显示我正在运行的完整代码。哦，如果您使用的是BeautifulSoup