如何用python保存网站中的所有图像_Python_Web_Web Scraping_Beautifulsoup

如何用python保存网站中的所有图像

python web web-scraping

如何用python保存网站中的所有图像,python,web,web-scraping,beautifulsoup,Python,Web,Web Scraping,Beautifulsoup,在我的图像处理实践中，我想要一些来自此站点的图像：而且我似乎无法访问他们的“src”，以便在BeautifulSoup中使用并提取图像。如果你能解决这个问题，请告诉我。这是我的代码，没有得到任何响应： from bs4 import BeautifulSoup from urllib.request import urlopen response = urlopen('https://511ny.org/cctv') soup = BeautifulSoup(response, 'htm

在我的图像处理实践中，我想要一些来自此站点的图像：而且我似乎无法访问他们的“src”，以便在BeautifulSoup中使用并提取图像。如果你能解决这个问题，请告诉我。这是我的代码，没有得到任何响应：

from bs4 import BeautifulSoup
from urllib.request import urlopen

response = urlopen('https://511ny.org/cctv')
soup = BeautifulSoup(response, 'html.parser')
pics = soup.findAll('img')
for pic in pics:
    print('img src: ', pic['src'])

我跟进了另一个解决方案，就是直接从网站下载所有图片，但是我找不到任何关于python的教程。

此网站中的图像不在初始html文件中，而是通过执行javascript动态加载的，beautifulsoup/urllib不会为您执行它们

要抓取动态网站，您应该使用一种无头浏览器，比如有python库的浏览器。这些浏览器与普通浏览器一样，但有一个区别；它们由您的代码而不是用户控制

selenium更好的替代品是Puppeter，但我在node.js中使用了它，我不确定它的python绑定质量。

此网站中的图像不在初始html文件中，而是通过执行javascript动态加载的，beautifulsoup/urllib不会为您执行它们

selenium的更好替代品是Puppeter，但我在node.js中使用了它，我不确定它的python绑定质量。

您好，我这样做了，我为每个图像创建了Xpath，然后获得了源代码

import requests
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import urllib.request


PATH=r'C:\Program Files (x86)\chromedriver.exe'
driver= webdriver.Chrome(PATH)
page=driver.get(r'https://511ny.org/cctv')

try:
    main = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, '//*[@id="cctvTable"]/tbody'))
    ) # I used XPATH of the table
    print (main.text)
except:
    driver.quit()

items=main.find_elements_by_tag_name('tr') # I use tr tag

for item in items:
    # print(item.text)
    #Get id
    identificador=item.get_attribute('data-id') 
    
    #Creating xpath and getting the image
    xpath='//*[@id="{}img"]'.format(identificador) 
    imagen=item.find_elements_by_xpath(xpath)[0]
    src=imagen.get_attribute('src')  
    urllib.request.urlretrieve(src,'{}.jpg'.format(identificador))

谢谢

您好，我这样做了，我为每个图像创建了Xpath，然后获得了源代码

import requests
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import urllib.request


PATH=r'C:\Program Files (x86)\chromedriver.exe'
driver= webdriver.Chrome(PATH)
page=driver.get(r'https://511ny.org/cctv')

try:
    main = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, '//*[@id="cctvTable"]/tbody'))
    ) # I used XPATH of the table
    print (main.text)
except:
    driver.quit()

items=main.find_elements_by_tag_name('tr') # I use tr tag

for item in items:
    # print(item.text)
    #Get id
    identificador=item.get_attribute('data-id') 
    
    #Creating xpath and getting the image
    xpath='//*[@id="{}img"]'.format(identificador) 
    imagen=item.find_elements_by_xpath(xpath)[0]
    src=imagen.get_attribute('src')  
    urllib.request.urlretrieve(src,'{}.jpg'.format(identificador))

谢谢你

哇，谢谢，你为我节省了很多时间。亲爱的@Jaime，再次感谢你。我想做同样的事情，有确切的结构，但它没有给我像旧的一样的图像。哦，对不起，它没有确切的结构。我正在想办法。如果你需要帮助创建一个主题，我很乐意帮助你哇，谢谢，你为我节省了很多时间。亲爱的@Jaime，再次感谢。我想做同样的事情，有确切的结构，但它没有给我像旧的一样的图像。哦，对不起，它没有确切的结构。我正在想办法。如果你需要一些帮助来创建一个主题，我很乐意帮助你