Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从div中的a标记获取href_Python_Tags_Beautifulsoup - Fatal编程技术网

Python 如何从div中的a标记获取href

Python 如何从div中的a标记获取href,python,tags,beautifulsoup,Python,Tags,Beautifulsoup,我使用的是beautifulsoup,我无法在a标记中提取href,无论我做什么,它都会向我返回错误。这是我正在使用的函数 def scrape_a(url): r = requests.get(url) soup = BeautifulSoup(r.content) news = soup.find_all("div", attrs={"class": "news"}) return news html数据结构是 <div class="news"> <a

我使用的是beautifulsoup,我无法在a标记中提取href,无论我做什么,它都会向我返回错误。这是我正在使用的函数

def scrape_a(url):
  r = requests.get(url)
  soup = BeautifulSoup(r.content)
  news =  soup.find_all("div", attrs={"class": "news"})
  return news
html数据结构是

<div class="news">
<a href="www.link.com">
<h2 class="heading">
Kenyan police foil potential bomb attack in Nairobi mall 
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>


我想从中提取的是href和h2 class='heading',每当我试图同时获取它们时,我都会得到一个错误none-type对象没有属性get-item

这样的东西怎么样

from bs4 import BeautifulSoup

def get_news_class_hrefs(html):
    """
    Finds all urls pointed to by all links inside
    'news' class div elements
    """
    soup = BeautifulSoup(html, 'html.parser')
    links = [a['href'] for div in soup.find_all("div", attrs={"class": "news"}) for a in div.find_all('a')]
    return links

# example html copied from question
html="""<div class="news">
<a href="www.link.com">
<h2 class="heading">
Kenyan police foil potential bomb attack in Nairobi mall 
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>"""

get_news_class_hrefs(html)
# Output:
# [u'www.link.com']
从bs4导入美化组
def get_news_class_hrefs(html):
"""
查找内部所有链接指向的所有URL
“新闻”类div元素
"""
soup=BeautifulSoup(html,'html.parser')
links=[a['href']表示汤中的div.find_all(“div”,attrs={“class”:“news”})表示汤中的div.find_all('a')]
返回链接
#从问题复制的html示例
html=”“”
"""
获取新闻类hrefs(html)
#输出:
#[u'www.link.com']

返回错误:返回self.attrs[key]KeyError:'href'hmm。。。不能重现那个问题吗?你到底在跑什么?我将更新以显示我正在运行的完整代码。哦,如果您使用的是BeautifulSoup