如何从<；a href>；标记内<；部门>；使用BeautifulSoup和Python的标记？_Python_Beautifulsoup

如何从<；a href>；标记内<；部门>；使用BeautifulSoup和Python的标记？

python

如何从<；a href>；标记内<；部门>；使用BeautifulSoup和Python的标记？,python,beautifulsoup,Python,Beautifulsoup,全部。我有一个关于Python的BeautifulSoup的快速问题。我有几段类似这样的HTML（唯一的区别是链接和产品名称），我正在尝试从“href”属性获取链接 <div id="productListing1" xmlns:dew="urn:Microsoft.Search.Response.Document"> <span id="rank" style="display:none;">94.36</span> <div class="produ

全部。我有一个关于Python的BeautifulSoup的快速问题。我有几段类似这样的HTML（唯一的区别是链接和产品名称），我正在尝试从“href”属性获取链接

<div id="productListing1" xmlns:dew="urn:Microsoft.Search.Response.Document">
<span id="rank" style="display:none;">94.36</span>
<div class="productPhoto">
    <img src="/assets/images/ocpimages/87684/00131cl.gif" height="82" width="82" />
</div>
<div class="productName">
    <a class="on" href="/Products/ProductInfoDisplay.aspx?SiteId=1&amp;Product=8768400131">CAPRI SUN - JUICE DRINK - COOLERS VARIETY PACK 6 OZ</a>
</div>
<div class="size">40 CT</div>

这是可行的（对于页面上的每个链接，我都会得到类似于

/Products/ProductInfoDisplay.aspx？SiteId=1&；Product=8768400131

）；但是，我一直在试图找出是否有一种方法可以在“href”属性中获取链接，而无需显式搜索“class=”on“。我想我的第一个问题应该是，这是否是找到这些信息的最佳方式（class=“on”似乎太笼统了，将来可能会崩溃，尽管我的CSS和HTML技能不太好）。我尝试了find、findAll、findAllnext等多种方法的组合，但我不能完全让它工作。这主要是我所拥有的（我多次重新整理和更改）：

如果这不是一个很好的方法，我如何从

标记中找到

标记？如果你需要更多信息，请告诉我

谢谢。

好的，一旦你有了

元素，你可以通过调用

find（）

来获得

子元素：

但是，由于

位于

的正上方，因此可以从div中获取

属性：

productDivs = soup.findAll('div', attrs={'class' : 'productName'})
for div in productDivs:
    print div.a['href']

现在，如果要将所有

元素放在一个列表中，上面的代码将不起作用，因为

find（）

只返回一个与其条件匹配的元素。您可以获取div列表并从中获取子元素，例如，使用列表理解：

productLinks = [div.a for div in 
        soup.findAll('div', attrs={'class' : 'productName'})]
for link in productLinks:
    print link['href']

我在第四章中给出了这个解决方案

for data in soup.find_all('div', class_='productName'):
    for a in data.find_all('a'):
        print(a.get('href')) #for getting link
        print(a.text) #for getting text between the link

您可以通过指定索引来避免这些循环。

data = soup.find_all('div', class_='productName')
a_class = data[0].find_all('a')
url_ = a_class[0].get('href')
print(url_)

productLinks = [div.a for div in 
        soup.findAll('div', attrs={'class' : 'productName'})]
for link in productLinks:
    print link['href']

for data in soup.find_all('div', class_='productName'):
    for a in data.find_all('a'):
        print(a.get('href')) #for getting link
        print(a.text) #for getting text between the link

data = soup.find_all('div', class_='productName')
a_class = data[0].find_all('a')
url_ = a_class[0].get('href')
print(url_)