Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/silverlight/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从div类收集链接_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 从div类收集链接

Python 从div类收集链接,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我在收集链接的代码中有以下部分: def Get_Links(): r = requests.get(main).text soup = BeautifulSoup(r, 'html.parser') links = [] for item in soup.findAll("a", {'class': 'ap-area-link'}): links.append(item.get("href")) return links 如果网页源是: <a class="ap-area-li

我在收集链接的代码中有以下部分:

def Get_Links():
r = requests.get(main).text
soup = BeautifulSoup(r, 'html.parser')
links = []
for item in soup.findAll("a", {'class': 'ap-area-link'}):
    links.append(item.get("href"))
return links
如果网页源是:

<a class="ap-area-link" href="https://www.webpage.com/product/item/">Item</a>
但是我的链接列表是空的吗?

您可以对该项目使用find方法。find_all方法返回一组答案,有点像数组。这样,您就可以对结果集的每个项使用常规bs4方法。您可以将结果集中的项目视为单个html内容

尝试替换: 对于soup.findAlldiv中的项,{'class':'large-4 medium-4 columns'}: links.appenditem.gethref

与: 对于soup.findAlldiv中的项,{'class':'large-4 medium-4'}:
links.appenditem.finda

尝试使用相邻的兄弟组合符获得h5之后的a,类如下所示

links = [i['href'] for i in soup.select('h5.show-for-small + a')]

阅读css选择器和组合器。

您可以尝试以下方法:

from bs4 import BeautifulSoup

html = """<div class="large-4 medium-4 columns">
     <h5 class="show-for-small">Product Name 1</h5>
      <a href="https://webpage.com/products/item/">Item</a>
      <h5 class="show-for-small">Product Name 2</h5>
      <a href="https://webpage.com/products/item/">Item</a>
    </div>
       """
soup = BeautifulSoup(html)

for item in soup.findAll("div", {'class': 'large-4 medium-4 columns'}):
  for n in item.find_all('a'): 
    print ('Link : '+ n.get('href'))
links = [i['href'] for i in soup.select('h5.show-for-small + a')]
from bs4 import BeautifulSoup

html = """<div class="large-4 medium-4 columns">
     <h5 class="show-for-small">Product Name 1</h5>
      <a href="https://webpage.com/products/item/">Item</a>
      <h5 class="show-for-small">Product Name 2</h5>
      <a href="https://webpage.com/products/item/">Item</a>
    </div>
       """
soup = BeautifulSoup(html)

for item in soup.findAll("div", {'class': 'large-4 medium-4 columns'}):
  for n in item.find_all('a'): 
    print ('Link : '+ n.get('href'))
Link : https://webpage.com/products/item/
Link : https://webpage.com/products/item/