Python 如何从带有Beauty soup的a标记中提取href属性值？_Python_Beautifulsoup

Python 如何从带有Beauty soup的a标记中提取href属性值？

python

Python 如何从带有Beauty soup的a标记中提取href属性值？,python,beautifulsoup,Python,Beautifulsoup,这是我在平台上提取的html的一部分，它包含我想要获取的代码片段，即类为“booktitle”的标记的href属性的值你能试试这个吗？（如果它不工作，很抱歉，我现在不在使用python的pc上） BeautifulSoup可以很好地处理您提供的html代码，如果您想获取标记的文本，只需使用“.text”，如果您想获取您使用的href“.get（'href'）”，或者如果您确定标记具有href值，您可以使用“['href']” 下面是一个简单的示例，使用html代码snipet很容易理解 fro

这是我在平台上提取的html的一部分，它包含我想要获取的代码片段，即类为“booktitle”的标记的href属性的值

你能试试这个吗？（如果它不工作，很抱歉，我现在不在使用python的pc上）

BeautifulSoup可以很好地处理您提供的html代码，如果您想获取标记的文本，只需使用“.text”，如果您想获取您使用的href“.get（'href'）”，或者如果您确定标记具有href值，您可以使用“['href']”

下面是一个简单的示例，使用html代码snipet很容易理解

from bs4 import BeautifulSoup 

html_code = '''

</div>
<div class="elementList" style="padding-top: 10px;">
<div class="left" style="width: 75%;">
<a class="leftAlignedImage" href="/book/show/2784.Ways_of_Seeing" title="Ways of Seeing"><img alt="Ways of Seeing" src="https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1464018308l/2784._SY75_.jpg"/></a>
<a class="bookTitle" href="/book/show/2784.Ways_of_Seeing">Ways of Seeing (Paperback)</a>
<br/>
<span class="by">by</span>
<span itemprop="author" itemscope="" itemtype="http://schema.org/Person">
<div class="authorName__container">
<a class="authorName" href="https://www.goodreads.com/author/show/29919.John_Berger" itemprop="url"><span itemprop="name">John Berger</span></a>
</div>

'''

soup = BeautifulSoup(html_code, 'html.parser')
tag = soup.find('a', {'class':'bookTitle'})

# - Book Title -
title = tag.text 
print(title)

# - Href Link -
href = tag.get('href')
print(href)

现在，如果您有多个具有相同类名的“a”标记：（'a'，{'class'：'booktitle'}），那么您可以这样做

from bs4 import BeautifulSoup
import requests

url = 'https://www.goodreads.com/shelf/show/art/gendersBooks.html'

html_source = requests.get(url).content 

soup = BeautifulSoup(html, 'html.parser')

# - To get the tag that we want -
tag = soup.find('a', {'class' : 'booktitle'})

# - Extract Book Title -
href = tag.text

# - Extract href from Tag -
title = tag.get('href')

首先获取所有“a”标记：

a_tags = soup.findAll('a', {'class' : 'booktitle'})

然后刮除所有图书标签信息，并将每个图书信息附加到图书列表中

books = []
for a in a_tags:
    try:
        title = a.text
        href = a.get('href')
        books.append({'title':title, 'href':href})  #<-- add each book dict to books list
        print(title)
        print(href)
    except:
        pass

books=[]
对于a_标记中的a：
尝试：
title=a.text
href=a.get（'href'）
books.append（{'title'：title，'href'：href}）#find_all
区分大小写-请尝试bsObj.find_all（'a'，{'class'：'bookTitle'）。这是否回答了您的问题？
from bs4 import BeautifulSoup
import requests

url = 'https://www.goodreads.com/shelf/show/art/gendersBooks.html'

html_source = requests.get(url).content 

soup = BeautifulSoup(html, 'html.parser')

# - To get the tag that we want -
tag = soup.find('a', {'class' : 'booktitle'})

# - Extract Book Title -
href = tag.text

# - Extract href from Tag -
title = tag.get('href')

a_tags = soup.findAll('a', {'class' : 'booktitle'})

books = []
for a in a_tags:
    try:
        title = a.text
        href = a.get('href')
        books.append({'title':title, 'href':href})  #<-- add each book dict to books list
        print(title)
        print(href)
    except:
        pass