Python BeautifulSoup从在线商店抓取多个产品

Python BeautifulSoup从在线商店抓取多个产品,python,beautifulsoup,web-crawler,Python,Beautifulsoup,Web Crawler,假设我想对这个站点进行爬网:https://www.alibaba.com/consumer-electronics/action-sports-camera/p44_p201340102?spm=a2700.8293689.HomeLeftCategory.d201340102.2f9a67afhxyQdZ 是否可以打开第一个产品,例如标题、价格和图片,然后返回概览页面,对下一个产品执行相同的操作,直到涵盖所有产品?这个想法非常简单。页面中的所有链接仅在向下滚动时加载,因此必须使用seleni

假设我想对这个站点进行爬网:
https://www.alibaba.com/consumer-electronics/action-sports-camera/p44_p201340102?spm=a2700.8293689.HomeLeftCategory.d201340102.2f9a67afhxyQdZ


是否可以打开第一个产品,例如标题、价格和图片,然后返回概览页面,对下一个产品执行相同的操作,直到涵盖所有产品?

这个想法非常简单。页面中的所有链接仅在向下滚动时加载,因此必须使用
selenium
滚动到页面末尾。滚动到页面末尾后,您必须使用
driver.page\u source
获取网站的html代码,并使用
beautifulsou
对其进行解析,以提取所有链接。以下是您的操作方法:

from bs4 import BeautifulSoup
import requests
from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.get('https://www.alibaba.com/consumer-electronics/action-sports-camera/p44_p201340102?spm=a2700.8293689.HomeLeftCategory.d201340102.2f9a67afhxyQdZ')
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
match=False
while(match==False):
        lastCount = lenOfPage
        time.sleep(1)
        lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        if lastCount==lenOfPage:
            match=True
time.sleep(3)
html = driver.page_source
driver.close()

soup = BeautifulSoup(html,'html5lib')

div_tags = soup.find_all('div', class_ = "grid-col-item")

links = []

for div in div_tags:
    links.append(div.div.a['href'])

print(links)
输出:

['//www.alibaba.com/product-detail/2020-Full-HD-4k-1080P-go_62556989288.html', '//www.alibaba.com/product-detail/Followsun-50-in-1-Accessories-for_62065838705.html', '//www.alibaba.com/product-detail/Factory-lowest-Price-720p-action-camera_60828536337.html', '//www.alibaba.com/product-detail/New-Product-2-0-Inch-Ltps_62394746927.html', '//www.alibaba.com/product-detail/Waterproof-full-hd-1080p-720p-sport_1600084796811.html', '//www.alibaba.com/product-detail/2020-Full-HD-1080P-Go-pro_62555774741.html', '//www.alibaba.com/product-detail/A7-Action-Camera-4k-HD720P-Sports_62255736516.html', '//www.alibaba.com/product-detail/Sports-Camera-4K-Action-Camera-Ultra_62504138600.html', '//www.alibaba.com/product-detail/2016-Hot-sale-Xiaomi-Yi-Action_60434045578.html' ... '//www.alibaba.com/product-detail/Promotion-item-wide-angle-action-camera_60819668707.html']
['https://video.xortec.de/hikvision-ds-2df4220-dx-w/316l', 'https://video.xortec.de/hikvision-ds-2td2137-35/py', 'https://video.xortec.de/hikvision-ds-2td2137-25/py', 'https://video.xortec.de/hikvision-ds-2td2137-15/py', 'https://video.xortec.de/hikvision-ds-2td2137-10/py', 'https://video.xortec.de/hikvision-ds-2td2137-7/py', 'https://video.xortec.de/hikvision-ds-2td2137-4/py', 'https://video.xortec.de/hikvision-ds-2td2137-4/v1', 'https://video.xortec.de/hikvision-ds-2df8c842ixs-ael-t2', 'https://video.xortec.de/hikvision-ds-2df8a442ixs-af/sp-t2', 'https://video.xortec.de/hikvision-ds-2de5432iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5425w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5425iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5330w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5232w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5232iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5225w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5225iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de4425w-de-e', 'https://video.xortec.de/hikvision-ds-2de4225w-de-e', 'https://video.xortec.de/hikvision-ds-2de4215w-de-e']
编辑:

以下是您要刮取的实际网站的代码:

from bs4 import BeautifulSoup
import requests

r = requests.get('https://video.xortec.de/search?sSearch=hikvision&p=1&o=1&n=24%22').text

soup = BeautifulSoup(r,'html5lib')

a_tags = soup.find_all('a', class_ = "product--title")

links = []
for a in a_tags:
    links.append(a['href'])

print(links)
输出:

['//www.alibaba.com/product-detail/2020-Full-HD-4k-1080P-go_62556989288.html', '//www.alibaba.com/product-detail/Followsun-50-in-1-Accessories-for_62065838705.html', '//www.alibaba.com/product-detail/Factory-lowest-Price-720p-action-camera_60828536337.html', '//www.alibaba.com/product-detail/New-Product-2-0-Inch-Ltps_62394746927.html', '//www.alibaba.com/product-detail/Waterproof-full-hd-1080p-720p-sport_1600084796811.html', '//www.alibaba.com/product-detail/2020-Full-HD-1080P-Go-pro_62555774741.html', '//www.alibaba.com/product-detail/A7-Action-Camera-4k-HD720P-Sports_62255736516.html', '//www.alibaba.com/product-detail/Sports-Camera-4K-Action-Camera-Ultra_62504138600.html', '//www.alibaba.com/product-detail/2016-Hot-sale-Xiaomi-Yi-Action_60434045578.html' ... '//www.alibaba.com/product-detail/Promotion-item-wide-angle-action-camera_60819668707.html']
['https://video.xortec.de/hikvision-ds-2df4220-dx-w/316l', 'https://video.xortec.de/hikvision-ds-2td2137-35/py', 'https://video.xortec.de/hikvision-ds-2td2137-25/py', 'https://video.xortec.de/hikvision-ds-2td2137-15/py', 'https://video.xortec.de/hikvision-ds-2td2137-10/py', 'https://video.xortec.de/hikvision-ds-2td2137-7/py', 'https://video.xortec.de/hikvision-ds-2td2137-4/py', 'https://video.xortec.de/hikvision-ds-2td2137-4/v1', 'https://video.xortec.de/hikvision-ds-2df8c842ixs-ael-t2', 'https://video.xortec.de/hikvision-ds-2df8a442ixs-af/sp-t2', 'https://video.xortec.de/hikvision-ds-2de5432iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5425w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5425iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5330w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5232w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5232iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de5225w-ae-e', 'https://video.xortec.de/hikvision-ds-2de5225iw-ae-e', 'https://video.xortec.de/hikvision-ds-2de4425w-de-e', 'https://video.xortec.de/hikvision-ds-2de4225w-de-e', 'https://video.xortec.de/hikvision-ds-2de4215w-de-e']

下面是我的代码,以提高可视性:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://video.xortec.de/search?sSearch=hikvision&p=1&o=1&n=24")
soup = BeautifulSoup(r.text, "html.parser")
products = soup.find_all('div', class_ = "product--detail-btn")

links = []

for product in products:
    links.append(product.a['href'])
print(links)

我现在如何浏览该列表以抓取文章?我的真实站点似乎比我的示例站点简单得多

为什么不呢?从收集此页面上每个产品的链接开始。一旦你得到了它,迭代每个链接(意味着每个产品)啊,妈的,也许我给了一个错误的例子:D我实际上正在抓取这个网站:在你的帮助下,我能够抓取我需要的每个url,但不知何故,我无法将一个站点的爬网与另一个站点的爬网联系起来。到目前为止,我一直在使用我的代码:从bs4导入请求import BeautifulSoup r=requests.get(“)soup=BeautifulSoup(r.text,“html.parser”)products=soup.find_all('div',class=“product--detail btn”)links=[]for products in:links.append(product.a['href'])打印(链接)我现在如何浏览该列表来抓取文章?我的真实站点似乎比我的示例站点简单得多。好的……你想从所有页面中抓取链接吗?还是只从第一页抓取?