Javascript 如何使用selenium python从动态网站检索所有链接_Javascript_Python_Json_Selenium Webdriver_Web Scraping

Javascript 如何使用selenium python从动态网站检索所有链接

javascript python json selenium-webdriver web-scraping

Javascript 如何使用selenium python从动态网站检索所有链接,javascript,python,json,selenium-webdriver,web-scraping,Javascript,Python,Json,Selenium Webdriver,Web Scraping,我有以下代码： rom selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.common.exceptions import TimeoutE

我有以下代码：

rom selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException


chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')

prefs = {'profile.managed_default_content_settings.images':2}
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chrome_options=chrome_options) 
driver.get("http://biggestbook.com/ui/catalog.html#/search?cr=1&rs=12&st=BM&category=1")
wait = WebDriverWait(driver,20)
links = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ess-product-brand + [href]")))
results = [link.get_attribute("href") for link in links]
#print(links)
print(results)
driver.quit()

但是，我只获得特色产品的结果/链接，而不是所有产品。有时，（很少）如果我运行20次，我会得到所有的产品。但我希望能一直得到所有的产品。我还尝试了以下不同的方法：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(chrome_options=chrome_options) 
driver.get("http://biggestbook.com/ui/catalog.html#/search?cr=1&rs=12&st=BM&category=1")

links = [elem.get_attribute("href") for elem in driver.find_elements_by_tag_name('a')]

print(links)

同样的问题。

我的问题是，我错过了什么，我不能得到所有的链接？这已经让我疯狂了两个星期了。我还试图延迟计时器，认为它可能没有加载，但它仍然无法工作。谢谢

您可以通过提取结果计数总数并向其中添加特征总数来尝试使用控制总数。这些数字已经提供给您，因此您可以循环，直到#hrefs满足此要求。您可能希望在循环中添加超时

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')

prefs = {'profile.managed_default_content_settings.images':2}
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chrome_options=chrome_options) 
driver.get("http://biggestbook.com/ui/catalog.html#/search?cr=1&rs=12&st=BM&category=1")
wait = WebDriverWait(driver,20)
nonFeaturedTotal = int(wait.until(EC.presence_of_element_located((By.CSS_SELECTOR , '.ess-view-item-count-text'))).text.split(' ')[-1])
featuredTotal = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".ess-product-container-featured"))))
expectedTotal = featuredTotal + nonFeaturedTotal

while False:
    len(driver.find_elements_by_css_selector(".ess-product-brand + [href]")) == expectedTotal

links = driver.find_elements_by_css_selector(".ess-product-brand + [href]")
results = [link.get_attribute("href") for link in links]

print(len(results))
print(links)

driver.quit()

哪些是所有产品？您正在看

厨房卷毛巾，打孔，2层，11 x 8，白色，85张/卷，30 Rls/Ct

，

防浸湿屏蔽中重纸板，8 1/2“，Grn/Burg，125/Pk

等？是的，没错。这些是测试用例。不过，我大部分只买特色的。谢谢你。似乎有效，但有时它仍然只提供功能。尽管如此，我还是会花时间去理解这一点。谢谢今晚我将再次查看是否有一种方法可以在不关闭会话的情况下自动执行多次重试？您可以使用try-catch。我还没有研究selenium本身是否提供了回退重试。如果可用，文档可能会对此进行详细说明。嘿，根据您在上面更正的链接代码，我尝试为表创建一个链接，但无法使其正常工作