Python 从div类收集链接
我在收集链接的代码中有以下部分:Python 从div类收集链接,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我在收集链接的代码中有以下部分: def Get_Links(): r = requests.get(main).text soup = BeautifulSoup(r, 'html.parser') links = [] for item in soup.findAll("a", {'class': 'ap-area-link'}): links.append(item.get("href")) return links 如果网页源是: <a class="ap-area-li
def Get_Links():
r = requests.get(main).text
soup = BeautifulSoup(r, 'html.parser')
links = []
for item in soup.findAll("a", {'class': 'ap-area-link'}):
links.append(item.get("href"))
return links
如果网页源是:
<a class="ap-area-link" href="https://www.webpage.com/product/item/">Item</a>
但是我的链接列表是空的吗?您可以对该项目使用find方法。find_all方法返回一组答案,有点像数组。这样,您就可以对结果集的每个项使用常规bs4方法。您可以将结果集中的项目视为单个html内容
尝试替换:
对于soup.findAlldiv中的项,{'class':'large-4 medium-4 columns'}:
links.appenditem.gethref
与:
对于soup.findAlldiv中的项,{'class':'large-4 medium-4'}:
links.appenditem.finda尝试使用相邻的兄弟组合符获得h5之后的a,类如下所示
links = [i['href'] for i in soup.select('h5.show-for-small + a')]
阅读css选择器和组合器。您可以尝试以下方法:
from bs4 import BeautifulSoup
html = """<div class="large-4 medium-4 columns">
<h5 class="show-for-small">Product Name 1</h5>
<a href="https://webpage.com/products/item/">Item</a>
<h5 class="show-for-small">Product Name 2</h5>
<a href="https://webpage.com/products/item/">Item</a>
</div>
"""
soup = BeautifulSoup(html)
for item in soup.findAll("div", {'class': 'large-4 medium-4 columns'}):
for n in item.find_all('a'):
print ('Link : '+ n.get('href'))
links = [i['href'] for i in soup.select('h5.show-for-small + a')]
from bs4 import BeautifulSoup
html = """<div class="large-4 medium-4 columns">
<h5 class="show-for-small">Product Name 1</h5>
<a href="https://webpage.com/products/item/">Item</a>
<h5 class="show-for-small">Product Name 2</h5>
<a href="https://webpage.com/products/item/">Item</a>
</div>
"""
soup = BeautifulSoup(html)
for item in soup.findAll("div", {'class': 'large-4 medium-4 columns'}):
for n in item.find_all('a'):
print ('Link : '+ n.get('href'))
Link : https://webpage.com/products/item/
Link : https://webpage.com/products/item/