Python 为什么selenium只接受前12项?

Python 为什么selenium只接受前12项?,python,selenium,web-scraping,Python,Selenium,Web Scraping,我正在尝试为一个网站()创建一个web scraper,它复制一个图像列表并将它们保存在一个目录中。一切似乎都很正常,除了我希望它能捡到的800多件物品,它只捡到了12件。我试过使用selenium的隐式等待,但似乎不起作用。我希望它能把这一页上的每一张照片都刮掉 下面是我的代码: from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait import shutil import

我正在尝试为一个网站()创建一个web scraper,它复制一个图像列表并将它们保存在一个目录中。一切似乎都很正常,除了我希望它能捡到的800多件物品,它只捡到了12件。我试过使用selenium的
隐式等待
,但似乎不起作用。我希望它能把这一页上的每一张照片都刮掉

下面是我的代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import shutil
import os
import requests

def spritescrape(driver):
    sprites_list = driver.find_elements_by_tag_name('img')
    sprite_srcs = [sprite.get_attribute('src') for sprite in sprites_list]
    return sprite_srcs

def download_images(srcs, dirname):
    for index, src in enumerate(srcs):
        response = requests.get(src, stream=True)
        save_image(response, dirname, index)
    del response

def save_image(image, dirname, suffix):
    with open('{dirname}/img_{suffix}.jpg'.format(dirname=dirname, suffix=suffix), 'wb') as out_file:
        shutil.copyfileobj(image.raw, out_file)

def make_dir(dirname):
    current_path = os.getcwd()
    path = os.path.join(current_path, dirname)
    if not os.path.exists(path):
        os.makedirs(path)

if __name__ == '__main__':
    chromeexe_path = r'C:\code\Learning Python\Scrapers\chromedriver.exe'
    driver = webdriver.Chrome(executable_path=chromeexe_path)
    driver.get(r'https://pokemondb.net/pokedex/national')
    driver.implicitly_wait(10)

    sprite_links = spritescrape(driver)
    dirname = 'sprites'
    make_dir(dirname)
    download_images(sprite_links, dirname)

我听说有些网站可以通过防止刮擦的方式建立,我想知道这个网站是否也是这样。我是一个非常新的编码,所以任何帮助获得所有的图像将不胜感激

当页面第一次打开时,所有元素都没有加载。它们似乎只在您向下滚动页面时加载。在这种情况下,我所做的是先滚动到页面底部,然后查找元素。这满足了我的需要

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

您需要将页面滚动到底部。但是,如果您直接转到
scrollHeight
,您将再次释放所有元素。您需要使用无限循环并在每页缓慢滚动,并在滚动期间添加elements属性,以使其不再丢失。我有890个元素

请尝试下面的代码

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://pokemondb.net/pokedex/national")

sprite_srcs=[]
height=1000
itemsnobefore=len(sprite_srcs)
while True:
    driver.execute_script("window.scrollTo(0," + str(height) + ");")
    sprites_list = driver.find_elements_by_tag_name('img')

    for sprite in sprites_list:
        if sprite.get_attribute('src') not in sprite_srcs:
            sprite_srcs.append(sprite.get_attribute('src'))

    itemsnoafter=len(sprite_srcs)
    #Break the loop when there is no more image tag left
    if itemsnobefore==itemsnoafter:
        break
    itemsnobefore=itemsnoafter
    height=height+500
    time.sleep(0.25)

print(len(sprites_list))

网站中的元素使用。因此,要提取图像的
src
属性列表,您必须向下滚动到页面末尾,您可以使用以下命令:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://pokemondb.net/pokedex/national")
    myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//img[@class]"))))
    while True:
        try:
            driver.execute_script("window.scrollBy(0,1500)", "");
            WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//img[@class]")))
            WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath("//img[@class]")) > myLength)
            elements = driver.find_elements_by_xpath("//img[@class]")
            myLength = len(elements)
        except TimeoutException:
            break
    print(myLength)
    for element in elements:
        print(element.get_attribute("src"))
    driver.quit()
    
  • 控制台输出:

    890
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/bulbasaur.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ivysaur.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venusaur.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charmander.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charmeleon.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/charizard.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/squirtle.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/wartortle.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/blastoise.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/caterpie.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/metapod.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/butterfree.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/weedle.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/kakuna.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/beedrill.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgey.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgeotto.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pidgeot.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/rattata.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/raticate.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/spearow.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/fearow.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ekans.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/arbok.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/pikachu.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/raichu.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/sandshrew.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/sandslash.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoran-f.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidorina.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoqueen.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoran-m.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidorino.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/nidoking.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/clefairy.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/clefable.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/vulpix.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/ninetales.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/jigglypuff.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/wigglytuff.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/zubat.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/golbat.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/oddish.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/gloom.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/vileplume.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/paras.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/parasect.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venonat.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/venomoth.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/diglett.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/dugtrio.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/meowth.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/persian.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/psyduck.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/golduck.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/mankey.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/primeape.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/growlithe.png
    https://img.pokemondb.net/sprites/omega-ruby-alpha-sapphire/dex/normal/arcanine.png
    .
    .
    .
    https://img.pokemondb.net/sprites/sword-shield/pixel/dreepy.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/drakloak.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/dragapult.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/zacian-crowned.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/zamazenta-crowned.png
    https://img.pokemondb.net/sprites/sword-shield/pixel/eternatus.png
    

谢谢!我刚刚试过这个,它似乎得到了前12个元素和最后20个元素,但没有中间的元素。有没有一种方法可以滚动一点,刮,再滚动一点,再刮,等等,重复直到我到达页面的末尾?我不熟悉selenium中的滚动。谢谢。事实上,我现在会同意@KunduK提供的答案,但是如果你有任何你认为更好/更有效的东西,我将非常高兴看到。非常感谢你-这非常有效!你能解释一下你是如何得到1000的起始高度的吗?