Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Can';尽管调用了类,但无法从div获取href_Python 3.x_Web Scraping_Beautifulsoup_Href - Fatal编程技术网

Python 3.x Can';尽管调用了类,但无法从div获取href

Python 3.x Can';尽管调用了类,但无法从div获取href,python-3.x,web-scraping,beautifulsoup,href,Python 3.x,Web Scraping,Beautifulsoup,Href,我正在尝试获取此网站中所有产品的链接: 例如,对于谷歌家庭迷你粉笔我应该得到 但是,我甚至无法进入href链接之前的div类。我试过不同的代码,都是bs4。以下是两个代码,我确信它们会起作用,但没有: 第一个代码: from bs4 import BeautifulSoup from urllib.request import Request, urlopen url_products = [] url = "https://www.officeworks.com.au/shop/office

我正在尝试获取此网站中所有产品的链接:

例如,对于谷歌家庭迷你粉笔我应该得到

但是,我甚至无法进入href链接之前的div类。我试过不同的代码,都是bs4。以下是两个代码,我确信它们会起作用,但没有:

第一个代码

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

url_products = []
url = "https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers"
req = Request(url)
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
data = soup.find_all('div', {'class': 'ProductTile__ProductImageWrapper-sc-1dlojg1-2 gRQAGx'})
for div in data:
    links = div.find_all('a')
    for a in links:
        print('https://www.officeworks.com.au/' + a['href'])
        url_products.append('https://www.officeworks.com.au/' + a['href'])
from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers')
soup = BeautifulSoup(r.content, 'lxml')
links = [item['href'] for item in soup.select('.gRQAGx > a')]
第二个代码

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

url_products = []
url = "https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers"
req = Request(url)
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
data = soup.find_all('div', {'class': 'ProductTile__ProductImageWrapper-sc-1dlojg1-2 gRQAGx'})
for div in data:
    links = div.find_all('a')
    for a in links:
        print('https://www.officeworks.com.au/' + a['href'])
        url_products.append('https://www.officeworks.com.au/' + a['href'])
from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers')
soup = BeautifulSoup(r.content, 'lxml')
links = [item['href'] for item in soup.select('.gRQAGx > a')]
我相信我没有给正确的班级打电话,但我无法弄清楚它是什么。
提前谢谢

由于页面是通过
JavaScript
加载的,因此在
呈现
JS
之前,无法提取预期输出的原因

因此,您可以使用
Selenium
,但我不推荐使用它,因为它会减慢您的任务

或者使用
requests\u html
中的
HTMLSession
动态渲染

否则,让我们只使用从它的
API
呈现的
JS
的原点

通过
Browser Developer tools
FireFox
等下的
Network选项卡跟踪
XHR
请求后

因此,我们可以在这里进行呼叫:

导入请求
json={“请求”:[{“索引名”:“产品wc bestmatch personal”,“参数”:“查询=&hitsPerPage=24&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=true&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(类别病理学%3A%22技术%2音频扬声器%2语音助理扬声器%22)&facets=%5B%22范围联机%22%2C%22林产品方案名称%22%2C%22硬盘类型%22%2C%22袋样式%22%2C%22插座类型%22%2C%22全尺寸内定尺寸%22%2C%22订书尺寸%22%2C%22连接性%22%2C%22智能家居兼容性%22%2C%22行业类型%22%2C%22大小容量%22%2C%22%22性能打印分辨率%22%2C%22含手机的手机%22%2C%22USBFLIDT类型%22%2C%22视频分辨率%22%2C%22最大穿孔容量%22%2C%22范围尾部%22%2C%22保护类型%22%2C%22规则长度%22%2C%22尺寸计数器%22%2C%22设备连接技术%22%2C%22测量单元%22%2C%22自粘%2C%22接口硬盘%22%2C%22锐化%22%2C%22连接带%22%2C%22麦克风类型%22%2C%22贴标板数据布局%22%2C%22端口数%22%2C%22操作系统条件%22%2C%22环形大小%22%2C%22性能健康监测功能%22%2C%22连接技术%22%2C%22双兼容%22%2C%22音频源%22%2C%22标签总数%22%2C%22刷形%22%2C%22最大处理器锁定速度%22%2C%22操作和%22%2C%22%22%22电源电池技术%22%2C%22t旅行地域%22%2C%22容量边界%22%2C%22许可证有效期%22%2C%22存储硬盘容量%22%2C%22脊柱大小%22%2C%22辊长%22%2C%22数量%22%2C%22灯柱类型%22%2C%22颜色%22%2C%22分割复制%2C%22自动文档供给容量%22%2C%22自动性能%22%2C%22性能按钮类型%22%2C%22性能2%2C%22显示分辨率%22%2C%22标签办公室使用方面%22%2C%22安全级别%22%2C%22最大支持的文档大小%22%2C%22批量在线购买%22%2C%22装订容量%22%2C%22存储包括闪存%22%2C%22兼容性客户安装安卓%22%2C%22抽屉数%22%2C%22存储内部内存化%22%2C%22安装%22%2C%22100回收产品%22%2C%22放置贴片安装%22%2C%22耳机%22%2C%22折叠式尺寸%22%2C%22端口总数网络端口数%22%2C%22电池充电孔数%22%2C%22噪音取消%22%2C%22表面形状%22%2C%22标签家庭使用面%22%2C%22尺寸说明%22%2C%22最大负载重量%22%2C%22电源端口数%22%2C%22兼容性定制苹果%22%2C%22TSA批准的%22%2C%22chassisType%22%2C%22浪涌抑制%22%2C%22打印技术打印机%22%2C%22放置数量兼容性%22%2C%22板尺寸基准%22%2C%22框架样式%22%2C%22服务提供商%22%2C%22蓝牙兼容性%2C%22扫描类型%22%2C%22光电容量%22%2C%22端口数%22%2C%22 RulingType%22%2C%22%22%2C学习技能焦点%22%2C%22许可类型%22%2C%22连接显示连接%22%2C%22性能极限厚度%22%2C%22性能解决方案%22%2C%22纸张重量GSM%22%2C%22处理器数量%22%2C%22配件%22%2C%22刷毛类型%22%2C%22光学变焦%22%2C%22处理器锁定速度%22%2C%22标签行业方面%22%2C%22性能近似值权限%22%2C%222区分打印%22%2C%22电源类型%22%2C%22接口类型%22%2C%22打印机连接技术%22%2C%22流处理器数量%22%2C%22基轮%22%2C%22性能评估电池板产量%22%2C%22纸张尺寸%22%2C%22处理器类型%22%2C%22壁厚%22%2C%22存储硬驱动设备容量%22%2C%22白色ss%22%2C%22运行时%22%2C%22冲压%22%2C%22切换%22%2C%22处理器制造商%22%2C%22设备机箱兼容性%22%2C%22机箱特性隔室数%22%2C%22显示大小%22%2C%22侧面扫描%22%2C%22无麸质%2C%22恢复时间%22%2C%22操作平台兼容性%22%2C%22电源%22%2C%22触摸屏%22%2C%22显示面板类型%22%2C%22次处理器类型%22%2C%22废料箱容量范围%22%2C%22软件分发媒体%22%2C%22学习范围%22%2C%22磁带宽度%22%2C%22存储容量%22%2C%22电缆长度%22%2C%22技能水平%22%2C%22飞行时间%22%2C%22能量消耗%22%2C%22最大推荐日数%22%2C%22内容布局%22%2C%22设备位置%22%2C%22品牌%22%2C%22%22%22%2C%数量EROFUSB31端口%22%2C%22LID包括%22%2C%22扫描分辨率%22%2C%22端口收费端口%22%2C%22信封大小%22%2C%22键盘兼容性%22%2C%22主要摄像头%22%2C%22支持的存储卡%22%2C%22连接显示连接面板%22%2C%22更新类别%22%2C%22价格%22%2C%22类别病变%22%22%2C%22范围详细信息%22%2C%22%22%2C范围线路%22%2C%22价格%22%2C%22品牌%22%2C%22颜色%22%2C%22音频源%22%2C%22电缆长度%22%2C%22以上