Python Selenium未从web下载图像
我尝试从以下网站下载图像: 我试图得到一张尤乔卡的图像,并将其存储在数据库中 这是我当前运行的代码:Python Selenium未从web下载图像,python,html,selenium,web-scraping,Python,Html,Selenium,Web Scraping,我尝试从以下网站下载图像: 我试图得到一张尤乔卡的图像,并将其存储在数据库中 这是我当前运行的代码: import selenium from selenium import webdriver DRIVER_PATH = 'my_path' wd = webdriver.Chrome(executable_path = DRIVER_PATH) wd.get('https://db.ygoprodeck.com/search/?&num=30&offset=120&v
import selenium
from selenium import webdriver
DRIVER_PATH = 'my_path'
wd = webdriver.Chrome(executable_path = DRIVER_PATH)
wd.get('https://db.ygoprodeck.com/search/?&num=30&offset=120&view=List')
images = wd.find_elements_by_xpath("//img[@class='lazy']")
if(images):
print("True")
wd.close()
当您分析网站时,网站会发出一个ajax调用来加载所有数据
import requests, json
res = requests.get("https://db.ygoprodeck.com/api_internal/v7/cardinfo.php?&num=30&offset=0&view=List&misc=yes")
print(res.json())
输出:
{'data': [{'id': 34541863,
'name': '"A" Cell Breeding Device',
'type': 'Spell Card',
'desc': 'During each of your Standby Phases, put 1 A-Counter on 1 face-up monster your opponent controls.',
'race': 'Continuous',
'archetype': 'Alien',
'card_sets': [{'set_name': 'Force of the Breaker',
'set_code': 'FOTB-EN043',
'set_rarity': 'Common',
'set_rarity_code': '(C)',
'set_price': '1.03'}],
'card_images': [{'id': 34541863,
'image_url': 'https://storage.googleapis.com/ygoprodeck.com/pics/34541863.jpg',
'image_url_small': 'https://storage.googleapis.com/ygoprodeck.com/pics_small/34541863.jpg'}],
'card_prices': [{'cardmarket_price': '0.11',
'tcgplayer_price': '0.22',
'ebay_price': '2.25',
'amazon_price': '0.25',
'coolstuffinc_price': '0.25'}],
'misc_info': [{'beta_name': 'A Cell Breeding Device',
'views': 202029,
'viewsweek': 5270,
'upvotes': 34,
'downvotes': 26,
'formats': ['Duel Links', 'TCG', 'OCG'],
'tcg_date': '2007-05-16',
'ocg_date': '2007-02-15'}]},
{'id': 64163367,
'name': '"A" Cell Incubator',
'type': 'Spell Card',
'desc': 'Each time an A-Counter(s) is removed from play by a card effect, place 1 A-Counter on this card. When this card is destroyed, distribute the A-Counters on this card among face-up monsters.',
'race': 'Continuous',
'archetype': 'Alien',
'card_sets': [{'set_name': "Gladiator's Assault",
'set_code': 'GLAS-EN062',
'set_rarity': 'Common',
'set_rarity_code': '(C)',
'set_price': '1'}],
'card_images': [{'id': 64163367,
'image_url': 'https://storage.googleapis.com/ygoprodeck.com/pics/64163367.jpg',
'image_url_small': 'https://storage.googleapis.com/ygoprodeck.com/pics_small/64163367.jpg'}],
'card_prices': [{'cardmarket_price': '0.10',
'tcgplayer_price': '0.26',
'ebay_price': '1.25',
'amazon_price': '0.25',
'coolstuffinc_price': '0.25'}],
'misc_info': [{'beta_name': 'A Cell Incubator',
'views': 165264,
'viewsweek': 3644,
'upvotes': 11,
'downvotes': 11,
'formats': ['Duel Links', 'TCG', 'OCG'],
'tcg_date': '2007-11-14',
'ocg_date': '2007-07-21'}]},
{'id': 91231901,
'name': '"A" Cell Recombination Device',
'type': 'Spell Card',
'desc': 'Target 1 face-up monster on the field; send 1 "Alien" monster from your Deck to the Graveyard, and if you do, place A-Counters on that monster equal to the Level of the sent monster. During your Main Phase, except the turn this card was sent to the Graveyard: You can banish this card from your Graveyard; add 1 "Alien" monster from your Deck to your hand.',
'race': 'Quick-Play',
'archetype': 'Alien',
'card_sets': [{'set_name': 'Invasion: Vengeance',
'set_code': 'INOV-EN063',
'set_rarity': 'Common',
'set_rarity_code': '(C)',
'set_price': '0.92'}],
'card_images': [{'id': 91231901,
'image_url': 'https://storage.googleapis.com/ygoprodeck.com/pics/91231901.jpg',
...
...
...
..
json数据包含所有数据,包括图像链接,这是您无法获取图像的原因,因为它们是延迟加载的。你必须等到它们被加载 Selenium通过两种类型的等待来解决这个问题
显式等待
或隐式等待
更极端的情况是使用time.sleep()
在这里阅读更多
解决方案
使用隐式等待
...
wd.get('https://db.ygoprodeck.com/search/?&num=30&offset=120&view=List')
wd.implicitly_wait(10)
images = wd.find_elements_by_xpath("//img[@class='lazy']")
....
...
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
images = WebDriverWait(wd, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "img.lazy")))
...
使用显式等待
...
wd.get('https://db.ygoprodeck.com/search/?&num=30&offset=120&view=List')
wd.implicitly_wait(10)
images = wd.find_elements_by_xpath("//img[@class='lazy']")
....
...
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
images = WebDriverWait(wd, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "img.lazy")))
...
一个更快速的解决方案是使用请求
、urllib3
或scrapy
,因为@bigbounty建议直接从ajax调用获取数据