Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Selenium只找到一小部分href链接_Python 3.x_Selenium_Web Scraping - Fatal编程技术网

Python 3.x Selenium只找到一小部分href链接

Python 3.x Selenium只找到一小部分href链接,python-3.x,selenium,web-scraping,Python 3.x,Selenium,Web Scraping,我试图从网页上获取所有产品的url,但我只获得了其中的一小部分 我的第一次尝试是使用Beautifulsoup来刮取网页,但后来我意识到selenium会更好,因为我需要多次单击“显示更多”按钮。我还添加了一个向下滚动页面的代码,我认为这就是问题所在,但结果没有改变 import time from selenium import webdriver from selenium.webdriver.support.wait import WebDriverWait from seleniu

我试图从网页上获取所有产品的url,但我只获得了其中的一小部分

我的第一次尝试是使用Beautifulsoup来刮取网页,但后来我意识到selenium会更好,因为我需要多次单击“显示更多”按钮。我还添加了一个向下滚动页面的代码,我认为这就是问题所在,但结果没有改变

import time   
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
    
def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome(executable_path="")
    driver.maximize_window()
    driver.get(link)
    time.sleep(3)
    # scroll down: repeated to ensure it reaches the bottom and all items are loaded
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    
    listing_links = []  
    
    while True:
        try:
            driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="main-content"]/div[2]/div[2]/div[4]/button'))))
            driver.execute_script("arguments[0].click();", WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#main-content > div:nth-child(2) > div.main-column > div.btn-wrapper.center > button"))))
            print("Button clicked")
            links = driver.find_elements_by_class_name('fop-contentWrapper')
            for link in links:
                algo=link.find_element_by_css_selector('.fop-contentWrapper a').get_attribute('href')
                print(algo)
                listing_links.append(str(algo))
        except:
            print("No more Buttons")
            break
    
    driver.close()
    return listing_links 

fresh_food = getListingLinks("https://www.ocado.com/browse/fresh-20002")

print(len(fresh_food))  ## Output: 228

正如你所看到的,我得到228个URL,而我想得到5605个链接,这是根据Ocado网页中产品的实际数量。我相信我的代码顺序有问题,但找不到正确的顺序。我将真诚地感谢任何帮助