Python 巨蟒靓汤第一只得到Href

Python 巨蟒靓汤第一只得到Href,python,beautifulsoup,Python,Beautifulsoup,我试图从一个网页的href中抓取URL,我从我正在抓取的一个div中抓取了一个列表项的外观片段 我的问题是,如何将下面的代码缩小到只刮取HTML的第一个Href # import the module import bs4 as bs import urllib.request import re import PyPDF2 import pypyodbc from time import sleep html ='<li><span class="num">20<

我试图从一个网页的href中抓取URL,我从我正在抓取的一个div中抓取了一个列表项的外观片段

我的问题是,如何将下面的代码缩小到只刮取HTML的第一个Href

# import the module
import bs4 as bs
import urllib.request
import re
import PyPDF2
import pypyodbc
from time import sleep

html ='<li><span class="num">20</span><span class="tmb tmb-xs tmb-artist-xs"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html"<img alt="The Sound Of Music - Do-Re-Mi lyrics" title="Do-Re-Mi" pagespeed_url_hash="552365003" src="http://img2-ak.lst.fm/i/u/174s/cf8387bbdbfc42ce82844a1cdfec9a33.png"></a></span><span class="song hasvid"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html#startvideo" class="vid";"></a><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html" class="song-link hasvidtoplyric">Do-Re-Mi Lyrics  </a><span class="artist"><a href="http://www.metrolyrics.com/the-sound-of-music-lyrics.html" class="subtitle" title="The Sound Of Music">The Sound Of Music </a></span></span><div class="last-week up">#21</div></li>'
soup = bs.BeautifulSoup(html,'lxml')


for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
    temp = link.get('href')
    print(temp)
#导入模块
将bs4作为bs导入
导入urllib.request
进口稀土
导入PyPDF2
导入PyODBC
从时间上导入睡眠
html='
  • 20#21
  • ' soup=bs.BeautifulSoup(html,'lxml') 对于soup.findAll('a',attrs={'href':re.compile(“^http://”)中的链接: temp=link.get('href') 打印(临时)
    您可以使用
    查找

    from bs4 import BeautifulSoup as soup
    html ='<li><span class="num">20</span><span class="tmb tmb-xs tmb-artist-xs"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html"<img alt="The Sound Of Music - Do-Re-Mi lyrics" title="Do-Re-Mi" pagespeed_url_hash="552365003" src="http://img2-ak.lst.fm/i/u/174s/cf8387bbdbfc42ce82844a1cdfec9a33.png"></a></span><span class="song hasvid"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html#startvideo" class="vid";"></a><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html" class="song-link hasvidtoplyric">Do-Re-Mi Lyrics  </a><span class="artist"><a href="http://www.metrolyrics.com/the-sound-of-music-lyrics.html" class="subtitle" title="The Sound Of Music">The Sound Of Music </a></span></span><div class="last-week up">#21</div></li>'
    result = soup(html, 'lxml').find('a')['href']
    
    你就是这样做的

    import bs4 as bs
    import urllib.request
    import re
    import PyPDF2
    import pypyodbc
    from time import sleep
    
    html ='<li><span class="num">20</span><span class="tmb tmb-xs tmb-artist-xs"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html"<img alt="The Sound Of Music - Do-Re-Mi lyrics" title="Do-Re-Mi" pagespeed_url_hash="552365003" src="http://img2-ak.lst.fm/i/u/174s/cf8387bbdbfc42ce82844a1cdfec9a33.png"></a></span><span class="song hasvid"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html#startvideo" class="vid";"></a><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html" class="song-link hasvidtoplyric">Do-Re-Mi Lyrics  </a><span class="artist"><a href="http://www.metrolyrics.com/the-sound-of-music-lyrics.html" class="subtitle" title="The Sound Of Music">The Sound Of Music </a></span></span><div class="last-week up">#21</div></li>'
    soup = bs.BeautifulSoup(html,'lxml')
    
    print soup.findAll('a', attrs={'href': re.compile("^http://")})[0].get('href')
    
    将bs4作为bs导入
    导入urllib.request
    进口稀土
    导入PyPDF2
    导入PyODBC
    从时间上导入睡眠
    html='
  • 20#21
  • ' soup=bs.BeautifulSoup(html,'lxml') 打印soup.findAll('a',attrs={'href':re.compile(“^http:/”)})[0]。获取('href')
    输出为[链接]
    import bs4 as bs
    import urllib.request
    import re
    import PyPDF2
    import pypyodbc
    from time import sleep
    
    html ='<li><span class="num">20</span><span class="tmb tmb-xs tmb-artist-xs"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html"<img alt="The Sound Of Music - Do-Re-Mi lyrics" title="Do-Re-Mi" pagespeed_url_hash="552365003" src="http://img2-ak.lst.fm/i/u/174s/cf8387bbdbfc42ce82844a1cdfec9a33.png"></a></span><span class="song hasvid"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html#startvideo" class="vid";"></a><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html" class="song-link hasvidtoplyric">Do-Re-Mi Lyrics  </a><span class="artist"><a href="http://www.metrolyrics.com/the-sound-of-music-lyrics.html" class="subtitle" title="The Sound Of Music">The Sound Of Music </a></span></span><div class="last-week up">#21</div></li>'
    soup = bs.BeautifulSoup(html,'lxml')
    
    print soup.findAll('a', attrs={'href': re.compile("^http://")})[0].get('href')