Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/365.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/http/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用BeautifulSoup获取我需要的特定内容_Python_Html_Python 3.x_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 如何使用BeautifulSoup获取我需要的特定内容

Python 如何使用BeautifulSoup获取我需要的特定内容,python,html,python-3.x,web-scraping,beautifulsoup,Python,Html,Python 3.x,Web Scraping,Beautifulsoup,我正在抓取一个网站,并从网站上的多个点提取信息,html如下所示: <div class="Item-Details"> <p class="Product-title"> <a href="/link_i_need"> text here that i need to grab more text here that i wou

我正在抓取一个网站,并从网站上的多个点提取信息,html如下所示:

<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)
sample = """
<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
</div>
"""
但它返回的是:

<p class="product-title">
<a href="/info">line 1 description as well as line 2 description with no break</a>
</p>

非常感谢您的帮助。

在获得
div
标记后,您可以通过以下操作获得
a
标记的
href
属性:
div.find(“a”)['href']
。因此,对于您的代码,它如下所示:

<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)
sample = """
<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
</div>
"""
请注意,如果任何元素没有
href
属性,这将出错

对于内部文本,可以使用
.text
属性,如下所示:

<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)
sample = """
<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
</div>
"""

首先,您缺少结束标记
。然后,你有一个打字错误。它是
“产品名称”
而不是
“产品名称”
。最后,在div上循环并不能使您更接近所需的输出

因此,假设您的
HTML
如下所示:

<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)
sample = """
<div class="Item-Details">
    <p class="Product-title">
        <a href="/link_i_need">
            text here that i need to grab
            more text here that i would like to grab
        </a>
    </p>
</div>
"""
要获得此信息:

/link_i_need
text here that i need to grab
            more text here that i would like to grab

非常感谢。这正是我需要的。我只是在结尾少了几个小部分。非常感谢!!!所以我遇到了一个问题,一个链接没有href标签,就像你提到的,现在它出错了。我如何添加if-else语句,以便如果href标记存在,则获取它,如果没有,则执行其他操作?无需担心,我通过尝试找到了它,除了(AttributeError)。谢谢!是的,html看起来像你添加的,我只是写了它,因为我不知道如何在chrome中复制/粘贴我的检查器。这解决了我的问题,非常感谢您的帮助!