Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 我想在动态网站上抓取图像,但不知道如何抓取_Python_Selenium_Web Scraping - Fatal编程技术网

Python 我想在动态网站上抓取图像,但不知道如何抓取

Python 我想在动态网站上抓取图像,但不知道如何抓取,python,selenium,web-scraping,Python,Selenium,Web Scraping,我正在寻求如何解决这个问题的建议。事情是这样的。我为纪梵希工作,我想从中收集所有的图片,以便将它们编辑成照片共享。我想要的图像是那些最初出现的图像,也就是说,那些在你把鼠标放在图像上之前出现在网站上的图像。区别很重要,因为当你把鼠标放在图像上时,它会变成一个戴着包的模特的图像;我只想要包本身的图像。当我用Chrome inspect工具查看页面时,我只能看到带有模型的图像链接 有没有一种方法可以满足我的需求?如果有,怎么做?不需要硒。图片位于标记内,因此通过正确的CSS选择器和字符串操作,您可以

我正在寻求如何解决这个问题的建议。事情是这样的。我为纪梵希工作,我想从中收集所有的图片,以便将它们编辑成照片共享。我想要的图像是那些最初出现的图像,也就是说,那些在你把鼠标放在图像上之前出现在网站上的图像。区别很重要,因为当你把鼠标放在图像上时,它会变成一个戴着包的模特的图像;我只想要包本身的图像。当我用Chrome inspect工具查看页面时,我只能看到带有模型的图像链接

有没有一种方法可以满足我的需求?如果有,怎么做?

不需要硒。图片位于标记内,因此通过正确的CSS选择器和字符串操作,您可以获得图片URL

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    p = p['srcset'].split(',')[-1].split()[0]
    print(p)
url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    p = p['srcset'].split(',')[-1].split()[0].replace('?sw=800', '?sw=1920')
    print(p)
印刷品:

https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw8c8efbee/images/BB50F2B0WY001/BB50F2B0WY001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw72d49df0/images/BB50F2B0WD001/BB50F2B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw16bf6873/images/BB50F0B0WD001/BB50F0B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwa89db782/images/BB50F0B0WD309/BB50F0B0WD309-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwb8bb418a/images/BB50F0B0WD051/BB50F0B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dweacfc390/images/BB50F2B0WD292/BB50F2B0WD292-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw51675237/images/BB50F2B0WD051/BB50F2B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw47ef9b42/images/BB50F3B0WD001/BB50F3B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw32b9df63/images/BB50F3B0WD051/BB50F3B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw102294c8/images/BB50F3B0WD496/BB50F3B0WD496-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw09d01050/images/BB50F3B0WD662/BB50F3B0WD662-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw442b46a4/images/BB50F2B0WD542/BB50F2B0WD542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1e454ef3/images/BB50F2B0WD309/BB50F2B0WD309-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw3aa399b9/images/BB05117012542/BB05117012542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw9eb8ec2d/images/BB05114012542/BB05114012542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw7e12db48/images/BBU017B00B001/BBU017B00B001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw924ff9f6/images/BBU017B00B058/BBU017B00B058-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1974540d/images/BBU017B00B662/BBU017B00B662-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw28c6592d/images/BBU017B00B140/BBU017B00B140-01-01.jpg?sw=800
编辑:要获得更高质量的图像,请将?sw=参数更改为更高的分辨率

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    p = p['srcset'].split(',')[-1].split()[0]
    print(p)
url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    p = p['srcset'].split(',')[-1].split()[0].replace('?sw=800', '?sw=1920')
    print(p)
编辑:要沿URL获取行李名称,您可以使用:

url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    pic_url = p['srcset'].split(',')[-1].split()[0].replace('?sw=800', '?sw=1920')
    name = p.find_next(class_='product-name').get_text(strip=True)
    print(name, pic_url)

您可能是在将鼠标悬停在图像上之后检查元素,这就是为什么它会给您模型的图像。该链接在悬停时从更新 原始行李图像 givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=466

要查看模型的图像,请执行以下操作:

givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-givenchy_master/default/dwd050ac75/images/BB500CB0WY001/BB500CB0WY001-01-02.jpg?sw=466

请参见粗体文本中的差异。 尝试在不悬停在行李图像上的情况下向下钻取到以下Xpath: /html/body/div[1]/main/div[5]/div[2]/div[3]/div/div/ul/li[1]/div/figure/a[1]/picture[1]/source[3] 正如Andrej在上面指出的,您可以使用BeautifulSoup来实现这一点。

要在将鼠标悬停在图像上之前打印图像的srcset属性值,您必须对所定位的元素的可见性进行归纳,并且您可以使用以下任一方法:

使用CSS_选择器:

使用XPATH:

控制台输出:

['https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw8c8efbee/images/BB50F2B0WY001/BB50F2B0WY001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw2264f584/LOOKS%20FWxS20/ECOM2.jpg?sw=1000', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw72d49df0/images/BB50F2B0WD001/BB50F2B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw16bf6873/images/BB50F0B0WD001/BB50F0B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwa89db782/images/BB50F0B0WD309/BB50F0B0WD309-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwb8bb418a/images/BB50F0B0WD051/BB50F0B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dweacfc390/images/BB50F2B0WD292/BB50F2B0WD292-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw51675237/images/BB50F2B0WD051/BB50F2B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw47ef9b42/images/BB50F3B0WD001/BB50F3B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw32b9df63/images/BB50F3B0WD051/BB50F3B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw102294c8/images/BB50F3B0WD496/BB50F3B0WD496-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw09d01050/images/BB50F3B0WD662/BB50F3B0WD662-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw442b46a4/images/BB50F2B0WD542/BB50F2B0WD542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1e454ef3/images/BB50F2B0WD309/BB50F2B0WD309-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw3aa399b9/images/BB05117012542/BB05117012542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw9eb8ec2d/images/BB05114012542/BB05114012542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw7e12db48/images/BBU017B00B001/BBU017B00B001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw924ff9f6/images/BBU017B00B058/BBU017B00B058-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1974540d/images/BBU017B00B662/BBU017B00B662-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw28c6592d/images/BBU017B00B140/BBU017B00B140-01-01.jpg?sw=466']
注意:您必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

好吧,现在我的目标改变了。您列表中的图像质量太低,因此我希望能够单击每个图像,然后在图像链接的页面上下载更高质量的图像。我该怎么做?@FernandoVarela将URL中的?sw=800参数更改为?sw=1920或更多…,请参阅我的编辑。额外问题。有没有办法让我将图片url与网站上显示的袋子名称相匹配?基本上,当我下载这些图片时,我希望将包的名称作为文件名。要获得完整的图片,只需删除url末尾的“`?sw=466``”