Python 如何使用BeautifulSoup获取我需要的特定内容_Python_Html_Python 3.x_Web Scraping_Beautifulsoup

Python 如何使用BeautifulSoup获取我需要的特定内容

python html python-3.x web-scraping

Python 如何使用BeautifulSoup获取我需要的特定内容,python,html,python-3.x,web-scraping,beautifulsoup,Python,Html,Python 3.x,Web Scraping,Beautifulsoup,我正在抓取一个网站，并从网站上的多个点提取信息，html如下所示： <div class="Item-Details"> <p class="Product-title"> <a href="/link_i_need"> text here that i need to grab more text here that i wou

我正在抓取一个网站，并从网站上的多个点提取信息，html如下所示：

<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)

sample = """
<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
</div>
"""

但它返回的是：

<p class="product-title">
<a href="/info">line 1 description as well as line 2 description with no break</a>
</p>

非常感谢您的帮助。

在获得

div

标记后，您可以通过以下操作获得

标记的

href

属性：

div.find（“a”）['href']

。因此，对于您的代码，它如下所示：

<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)

sample = """
<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
</div>
"""

请注意，如果任何元素没有

href

属性，这将出错

对于内部文本，可以使用

.text

属性，如下所示：

<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)

sample = """
<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
</div>
"""

首先，您缺少结束标记

。然后，你有一个打字错误。它是

“产品名称”

而不是

“产品名称”

。最后，在div上循环并不能使您更接近所需的输出

因此，假设您的

HTML

如下所示：

<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)

sample = """
<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
</div>
"""

要获得此信息：

/link_i_need
text here that i need to grab
            more text here that i would like to grab

非常感谢。这正是我需要的。我只是在结尾少了几个小部分。非常感谢！！！所以我遇到了一个问题，一个链接没有href标签，就像你提到的，现在它出错了。我如何添加if-else语句，以便如果href标记存在，则获取它，如果没有，则执行其他操作？无需担心，我通过尝试找到了它，除了（AttributeError）。谢谢！是的，html看起来像你添加的，我只是写了它，因为我不知道如何在chrome中复制/粘贴我的检查器。这解决了我的问题，非常感谢您的帮助！