Html Python只提取for循环中每n次出现的第一个href链接

Html Python只提取for循环中每n次出现的第一个href链接,html,python-3.x,web-scraping,beautifulsoup,Html,Python 3.x,Web Scraping,Beautifulsoup,我正在尝试使用python进行简单的web抓取,但在获取链接名称时出现问题,因为下面提到的同一类btn中有2到3个href标题,而我只需要为循环中的每一次新出现打印第一个标题 #!/usr/bin/python3 from bs4 import BeautifulSoup import requests url = "https://www.learndatasci.com/free-data-science-books/" # Getting the webpage, creating a

我正在尝试使用python进行简单的web抓取,但在获取链接名称时出现问题,因为下面提到的同一类
btn
中有2到3个
href
标题,而我只需要为循环中的每一次新出现打印第一个标题

#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests

url = "https://www.learndatasci.com/free-data-science-books/"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')

# Extracting all the <a> tags into a list.
tags = soup.find_all('a', class_='btn')

# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    print(tag.get('href'))
当需要输出时:

http://www.cin.ufpe.br/~tfl2/artificial-intelligence-modern-approach.9780131038059.25368.pdf
http://www.amazon.com/gp/product/0136042597/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0136042597&linkCode=as2&tag=learnds-20&linkId=3FRORB7P56CEWSK5
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
http://amzn.to/1WePh0N
http://www.e-booksdirectory.com/details.php?ebook=9575
http://amzn.to/1FcalRp
http://www.cin.ufpe.br/~tfl2/artificial-intelligence-modern-approach.9780131038059.25368.pdf
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
http://www.e-booksdirectory.com/details.php?ebook=9575

BeautifulSoup拥有卓越的功能,只需将其用于:

演示:

对于可以使用的每组链接,您确实有一个父元素

for tag in soup.select('.book a.btn:first-of-type'):
这将适用于任何数量的链接每本书

>>> for tag in soup.select('a.btn:nth-of-type(odd)'): print(tag['href'])
...
http://www.cin.ufpe.br/~tfl2/artificial-intelligence-modern-approach.9780131038059.25368.pdf
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
http://www.e-booksdirectory.com/details.php?ebook=9575
... etc
for tag in soup.select('.book a.btn:first-of-type'):