Python 3.x Can';尽管调用了类,但无法从div获取href
我正在尝试获取此网站中所有产品的链接: 例如,对于谷歌家庭迷你粉笔我应该得到 但是,我甚至无法进入href链接之前的div类。我试过不同的代码,都是bs4。以下是两个代码,我确信它们会起作用,但没有: 第一个代码:Python 3.x Can';尽管调用了类,但无法从div获取href,python-3.x,web-scraping,beautifulsoup,href,Python 3.x,Web Scraping,Beautifulsoup,Href,我正在尝试获取此网站中所有产品的链接: 例如,对于谷歌家庭迷你粉笔我应该得到 但是,我甚至无法进入href链接之前的div类。我试过不同的代码,都是bs4。以下是两个代码,我确信它们会起作用,但没有: 第一个代码: from bs4 import BeautifulSoup from urllib.request import Request, urlopen url_products = [] url = "https://www.officeworks.com.au/shop/office
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
url_products = []
url = "https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers"
req = Request(url)
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
data = soup.find_all('div', {'class': 'ProductTile__ProductImageWrapper-sc-1dlojg1-2 gRQAGx'})
for div in data:
links = div.find_all('a')
for a in links:
print('https://www.officeworks.com.au/' + a['href'])
url_products.append('https://www.officeworks.com.au/' + a['href'])
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers')
soup = BeautifulSoup(r.content, 'lxml')
links = [item['href'] for item in soup.select('.gRQAGx > a')]
第二个代码:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
url_products = []
url = "https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers"
req = Request(url)
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
data = soup.find_all('div', {'class': 'ProductTile__ProductImageWrapper-sc-1dlojg1-2 gRQAGx'})
for div in data:
links = div.find_all('a')
for a in links:
print('https://www.officeworks.com.au/' + a['href'])
url_products.append('https://www.officeworks.com.au/' + a['href'])
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers')
soup = BeautifulSoup(r.content, 'lxml')
links = [item['href'] for item in soup.select('.gRQAGx > a')]
我相信我没有给正确的班级打电话,但我无法弄清楚它是什么。
提前谢谢 由于页面是通过
JavaScript
加载的,因此在呈现JS
之前,无法提取预期输出的原因
因此,您可以使用Selenium
,但我不推荐使用它,因为它会减慢您的任务
或者使用requests\u html
中的HTMLSession
动态渲染
否则,让我们只使用从它的API
呈现的JS
的原点
通过Browser Developer tools
FireFox
等下的Network选项卡跟踪XHR
请求后
因此,我们可以在这里进行呼叫:
导入请求
json={“请求”:[{“索引名”:“产品wc bestmatch personal”,“参数”:“查询=&hitsPerPage=24&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=true&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(类别病理学%3A%22技术%2音频扬声器%2语音助理扬声器%22)&facets=%5B%22范围联机%22%2C%22林产品方案名称%22%2C%22硬盘类型%22%2C%22袋样式%22%2C%22插座类型%22%2C%22全尺寸内定尺寸%22%2C%22订书尺寸%22%2C%22连接性%22%2C%22智能家居兼容性%22%2C%22行业类型%22%2C%22大小容量%22%2C%22%22性能打印分辨率%22%2C%22含手机的手机%22%2C%22USBFLIDT类型%22%2C%22视频分辨率%22%2C%22最大穿孔容量%22%2C%22范围尾部%22%2C%22保护类型%22%2C%22规则长度%22%2C%22尺寸计数器%22%2C%22设备连接技术%22%2C%22测量单元%22%2C%22自粘%2C%22接口硬盘%22%2C%22锐化%22%2C%22连接带%22%2C%22麦克风类型%22%2C%22贴标板数据布局%22%2C%22端口数%22%2C%22操作系统条件%22%2C%22环形大小%22%2C%22性能健康监测功能%22%2C%22连接技术%22%2C%22双兼容%22%2C%22音频源%22%2C%22标签总数%22%2C%22刷形%22%2C%22最大处理器锁定速度%22%2C%22操作和%22%2C%22%22%22电源电池技术%22%2C%22t旅行地域%22%2C%22容量边界%22%2C%22许可证有效期%22%2C%22存储硬盘容量%22%2C%22脊柱大小%22%2C%22辊长%22%2C%22数量%22%2C%22灯柱类型%22%2C%22颜色%22%2C%22分割复制%2C%22自动文档供给容量%22%2C%22自动性能%22%2C%22性能按钮类型%22%2C%22性能2%2C%22显示分辨率%22%2C%22标签办公室使用方面%22%2C%22安全级别%22%2C%22最大支持的文档大小%22%2C%22批量在线购买%22%2C%22装订容量%22%2C%22存储包括闪存%22%2C%22兼容性客户安装安卓%22%2C%22抽屉数%22%2C%22存储内部内存化%22%2C%22安装%22%2C%22100回收产品%22%2C%22放置贴片安装%22%2C%22耳机%22%2C%22折叠式尺寸%22%2C%22端口总数网络端口数%22%2C%22电池充电孔数%22%2C%22噪音取消%22%2C%22表面形状%22%2C%22标签家庭使用面%22%2C%22尺寸说明%22%2C%22最大负载重量%22%2C%22电源端口数%22%2C%22兼容性定制苹果%22%2C%22TSA批准的%22%2C%22chassisType%22%2C%22浪涌抑制%22%2C%22打印技术打印机%22%2C%22放置数量兼容性%22%2C%22板尺寸基准%22%2C%22框架样式%22%2C%22服务提供商%22%2C%22蓝牙兼容性%2C%22扫描类型%22%2C%22光电容量%22%2C%22端口数%22%2C%22 RulingType%22%2C%22%22%2C学习技能焦点%22%2C%22许可类型%22%2C%22连接显示连接%22%2C%22性能极限厚度%22%2C%22性能解决方案%22%2C%22纸张重量GSM%22%2C%22处理器数量%22%2C%22配件%22%2C%22刷毛类型%22%2C%22光学变焦%22%2C%22处理器锁定速度%22%2C%22标签行业方面%22%2C%22性能近似值权限%22%2C%222区分打印%22%2C%22电源类型%22%2C%22接口类型%22%2C%22打印机连接技术%22%2C%22流处理器数量%22%2C%22基轮%22%2C%22性能评估电池板产量%22%2C%22纸张尺寸%22%2C%22处理器类型%22%2C%22壁厚%22%2C%22存储硬驱动设备容量%22%2C%22白色ss%22%2C%22运行时%22%2C%22冲压%22%2C%22切换%22%2C%22处理器制造商%22%2C%22设备机箱兼容性%22%2C%22机箱特性隔室数%22%2C%22显示大小%22%2C%22侧面扫描%22%2C%22无麸质%2C%22恢复时间%22%2C%22操作平台兼容性%22%2C%22电源%22%2C%22触摸屏%22%2C%22显示面板类型%22%2C%22次处理器类型%22%2C%22废料箱容量范围%22%2C%22软件分发媒体%22%2C%22学习范围%22%2C%22磁带宽度%22%2C%22存储容量%22%2C%22电缆长度%22%2C%22技能水平%22%2C%22飞行时间%22%2C%22能量消耗%22%2C%22最大推荐日数%22%2C%22内容布局%22%2C%22设备位置%22%2C%22品牌%22%2C%22%22%22%2C%数量EROFUSB31端口%22%2C%22LID包括%22%2C%22扫描分辨率%22%2C%22端口收费端口%22%2C%22信封大小%22%2C%22键盘兼容性%22%2C%22主要摄像头%22%2C%22支持的存储卡%22%2C%22连接显示连接面板%22%2C%22更新类别%22%2C%22价格%22%2C%22类别病变%22%22%2C%22范围详细信息%22%2C%22%22%2C范围线路%22%2C%22价格%22%2C%22品牌%22%2C%22颜色%22%2C%22音频源%22%2C%22电缆长度%22%2C%22以上