Python Selenium未从web下载图像_Python_Html_Selenium_Web Scraping

Python Selenium未从web下载图像

python html selenium web-scraping

Python Selenium未从web下载图像,python,html,selenium,web-scraping,Python,Html,Selenium,Web Scraping,我尝试从以下网站下载图像：我试图得到一张尤乔卡的图像，并将其存储在数据库中这是我当前运行的代码： import selenium from selenium import webdriver DRIVER_PATH = 'my_path' wd = webdriver.Chrome(executable_path = DRIVER_PATH) wd.get('https://db.ygoprodeck.com/search/?&num=30&offset=120&v

我尝试从以下网站下载图像：

我试图得到一张尤乔卡的图像，并将其存储在数据库中

这是我当前运行的代码：

import selenium
from selenium import webdriver
DRIVER_PATH = 'my_path'
wd = webdriver.Chrome(executable_path = DRIVER_PATH)
wd.get('https://db.ygoprodeck.com/search/?&num=30&offset=120&view=List')
images = wd.find_elements_by_xpath("//img[@class='lazy']")
if(images):
    print("True")
wd.close()

当您分析网站时，网站会发出一个ajax调用来加载所有数据

import requests, json

res = requests.get("https://db.ygoprodeck.com/api_internal/v7/cardinfo.php?&num=30&offset=0&view=List&misc=yes")

print(res.json())

输出：

{'data': [{'id': 34541863,
   'name': '"A" Cell Breeding Device',
   'type': 'Spell Card',
   'desc': 'During each of your Standby Phases, put 1 A-Counter on 1 face-up monster your opponent controls.',
   'race': 'Continuous',
   'archetype': 'Alien',
   'card_sets': [{'set_name': 'Force of the Breaker',
     'set_code': 'FOTB-EN043',
     'set_rarity': 'Common',
     'set_rarity_code': '(C)',
     'set_price': '1.03'}],
   'card_images': [{'id': 34541863,
     'image_url': 'https://storage.googleapis.com/ygoprodeck.com/pics/34541863.jpg',
     'image_url_small': 'https://storage.googleapis.com/ygoprodeck.com/pics_small/34541863.jpg'}],
   'card_prices': [{'cardmarket_price': '0.11',
     'tcgplayer_price': '0.22',
     'ebay_price': '2.25',
     'amazon_price': '0.25',
     'coolstuffinc_price': '0.25'}],
   'misc_info': [{'beta_name': 'A Cell Breeding Device',
     'views': 202029,
     'viewsweek': 5270,
     'upvotes': 34,
     'downvotes': 26,
     'formats': ['Duel Links', 'TCG', 'OCG'],
     'tcg_date': '2007-05-16',
     'ocg_date': '2007-02-15'}]},
  {'id': 64163367,
   'name': '"A" Cell Incubator',
   'type': 'Spell Card',
   'desc': 'Each time an A-Counter(s) is removed from play by a card effect, place 1 A-Counter on this card. When this card is destroyed, distribute the A-Counters on this card among face-up monsters.',
   'race': 'Continuous',
   'archetype': 'Alien',
   'card_sets': [{'set_name': "Gladiator's Assault",
     'set_code': 'GLAS-EN062',
     'set_rarity': 'Common',
     'set_rarity_code': '(C)',
     'set_price': '1'}],
   'card_images': [{'id': 64163367,
     'image_url': 'https://storage.googleapis.com/ygoprodeck.com/pics/64163367.jpg',
     'image_url_small': 'https://storage.googleapis.com/ygoprodeck.com/pics_small/64163367.jpg'}],
   'card_prices': [{'cardmarket_price': '0.10',
     'tcgplayer_price': '0.26',
     'ebay_price': '1.25',
     'amazon_price': '0.25',
     'coolstuffinc_price': '0.25'}],
   'misc_info': [{'beta_name': 'A Cell Incubator',
     'views': 165264,
     'viewsweek': 3644,
     'upvotes': 11,
     'downvotes': 11,
     'formats': ['Duel Links', 'TCG', 'OCG'],
     'tcg_date': '2007-11-14',
     'ocg_date': '2007-07-21'}]},
  {'id': 91231901,
   'name': '"A" Cell Recombination Device',
   'type': 'Spell Card',
   'desc': 'Target 1 face-up monster on the field; send 1 "Alien" monster from your Deck to the Graveyard, and if you do, place A-Counters on that monster equal to the Level of the sent monster. During your Main Phase, except the turn this card was sent to the Graveyard: You can banish this card from your Graveyard; add 1 "Alien" monster from your Deck to your hand.',
   'race': 'Quick-Play',
   'archetype': 'Alien',
   'card_sets': [{'set_name': 'Invasion: Vengeance',
     'set_code': 'INOV-EN063',
     'set_rarity': 'Common',
     'set_rarity_code': '(C)',
     'set_price': '0.92'}],
   'card_images': [{'id': 91231901,
     'image_url': 'https://storage.googleapis.com/ygoprodeck.com/pics/91231901.jpg',
...
...
...
..

json数据包含所有数据，包括图像链接，这是您无法获取图像的原因，因为它们是延迟加载的。你必须等到它们被加载

Selenium通过两种类型的等待来解决这个问题

显式等待

或

隐式等待

更极端的情况是使用

time.sleep（）
在这里阅读更多
解决方案
使用隐式等待
...
wd.get('https://db.ygoprodeck.com/search/?&num=30&offset=120&view=List')
wd.implicitly_wait(10)
images = wd.find_elements_by_xpath("//img[@class='lazy']")
....

...
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
images = WebDriverWait(wd, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "img.lazy")))
...


使用显式等待
...
wd.get('https://db.ygoprodeck.com/search/?&num=30&offset=120&view=List')
wd.implicitly_wait(10)
images = wd.find_elements_by_xpath("//img[@class='lazy']")
....

...
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
images = WebDriverWait(wd, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "img.lazy")))
...


一个更快速的解决方案是使用请求
、urllib3
或scrapy
，因为@bigbounty建议直接从ajax调用获取数据