Python 如何使用BeautifulSoup获取我需要的特定内容
我正在抓取一个网站,并从网站上的多个点提取信息,html如下所示:Python 如何使用BeautifulSoup获取我需要的特定内容,python,html,python-3.x,web-scraping,beautifulsoup,Python,Html,Python 3.x,Web Scraping,Beautifulsoup,我正在抓取一个网站,并从网站上的多个点提取信息,html如下所示: <div class="Item-Details"> <p class="Product-title"> <a href="/link_i_need"> text here that i need to grab more text here that i wou
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a")['href'])
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a").text)
sample = """
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
</div>
"""
但它返回的是:
<p class="product-title">
<a href="/info">line 1 description as well as line 2 description with no break</a>
</p>
非常感谢您的帮助。在获得
div
标记后,您可以通过以下操作获得a
标记的href
属性:div.find(“a”)['href']
。因此,对于您的代码,它如下所示:
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a")['href'])
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a").text)
sample = """
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
</div>
"""
请注意,如果任何元素没有href
属性,这将出错
对于内部文本,可以使用.text
属性,如下所示:
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a")['href'])
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a").text)
sample = """
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
</div>
"""
首先,您缺少结束标记
。然后,你有一个打字错误。它是“产品名称”
而不是“产品名称”
。最后,在div上循环并不能使您更接近所需的输出
因此,假设您的HTML
如下所示:
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a")['href'])
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a").text)
sample = """
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
</div>
"""
要获得此信息:
/link_i_need
text here that i need to grab
more text here that i would like to grab
非常感谢。这正是我需要的。我只是在结尾少了几个小部分。非常感谢!!!所以我遇到了一个问题,一个链接没有href标签,就像你提到的,现在它出错了。我如何添加if-else语句,以便如果href标记存在,则获取它,如果没有,则执行其他操作?无需担心,我通过尝试找到了它,除了(AttributeError)。谢谢!是的,html看起来像你添加的,我只是写了它,因为我不知道如何在chrome中复制/粘贴我的检查器。这解决了我的问题,非常感谢您的帮助!