Python 3.x 嵌套循环的循环不正确_Python 3.x_Selenium

Python 3.x 嵌套循环的循环不正确

python-3.x selenium

Python 3.x 嵌套循环的循环不正确,python-3.x,selenium,Python 3.x,Selenium,我试图在一个网站上浏览书籍，在进入下一页之前，应该只得到20个结果我通过查看一个元素来获得总页数（num_pages），这就为我提供了可以迭代的最大页数我在代码中遇到的问题是，嵌套循环（定位锚节点）不会从一个页面中只提供20个URL，而是在相同的页面上循环我不是100%的嵌套循环出错，所以任何指针都会非常有用 options = webdriver.FirefoxOptions() options.add_argument("--headless") driver =

我试图在一个网站上浏览书籍，在进入下一页之前，应该只得到20个结果

我通过查看一个元素来获得总页数（num_pages），这就为我提供了可以迭代的最大页数

我在代码中遇到的问题是，嵌套循环（定位锚节点）不会从一个页面中只提供20个URL，而是在相同的页面上循环

我不是100%的嵌套循环出错，所以任何指针都会非常有用

options = webdriver.FirefoxOptions()
options.add_argument("--headless")

driver = webdriver.Firefox(executable_path=GeckoDriverManager().install(), options=options)
#driver = webdriver.Chrome(executable_path=chromedriver, options=options)
print("Browsing to Wordery")
driver.get('https://wordery.com/search?viewBy=grid&resultsPerPage=20&page=1&leadTime[]=any&interestAge[]=Babies')
#print((driver.page_source).encode('utf-8'))
driver.implicitly_wait(3)

#Get total pages
num_page = driver.find_element_by_xpath('//span[@class="js-pnav-max"]')


#iterate through pages grabbing links
for i in range(int(num_page.text)):
    
    #locate anchor nodes
    lists = driver.find_elements_by_xpath("//a[@class='"'c-book__title'"']")
    links = []
    for lis in lists:
        
        # Fetch and store the links
        links.append(lis.get_attribute('href'))
        with open('search_results_urls.txt', 'a') as filehandle:
            filehandle.write('%s\n' % lis.get_attribute('href'))
            print(lis.get_attribute('href'))
    
    page_ = i + 1
    click_next = driver.find_element_by_xpath('//a[@class="o-layout__item o-link--arrow js-pnav-next u-utils-pnav__next"]').click()

driver.quit()

奇怪的是，它会在第一页循环33次（只有20个项目，因此它复制了它们），然后产生以下错误：

selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of <a class="c-book__title" href="/peppa-pig-practise-with-peppa-wipe-clean-first-letters-peppa-pig-9780723292081"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

一旦我将其添加到页面循环中，它就会在相同的URL页面中循环

这是我的最新代码，下面是答案。我仍然在挣扎着让它在第一页上循环多次

#Get total pages
num_page = driver.find_element_by_xpath('//span[@class="js-pnav-max"]')

for i in range(int(num_page.text)):
    driver.implicitly_wait(10)
    lists = driver.find_elements_by_xpath("//a[@class='c-book__title']")
    links = [link.get_attribute('href') for link in lists]
    
    with open('search_results_urls.txt', 'a') as filehandle:
        for link in links:
            filehandle.write(link + "\n")
            print(link + "\n")
    
    click_next = driver.find_element_by_xpath('//a[@class="o-layout__item o-link--arrow js-pnav-next u-utils-pnav__next"]').click()

您获得相同链接的原因是因为您已将其分配给循环外部。当页面被刷新时，页面中仍然有以前的链接

放置在for loop的内部。使用

WebDriverWait（）

并等待

所有元素的出现（）

，以便在单击下一页时，在刷新页面时同步

num_page = driver.find_element_by_xpath('//span[@class="js-pnav-max"]')
for i in range(int(num_page.text)):
    lists =WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//a[@class='c-book__title']"))) 
    links = [link.get_attribute('href') for link in lists]
    with open('search_results_urls.txt', 'a') as filehandle:
        for link in links:
            filehandle.write(link)
            print(link)

    click_next = driver.find_element_by_xpath('//a[@class="o-layout__item o-link--arrow js-pnav-next u-utils-pnav__next"]').click()
    #provide some delay to refreshed the page.
     time.sleep(2)

您需要导入以下库

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

我也不确定最后一个示例是否有效，因为我不会在单击完成后更新新页面的列表或链接。使用该代码，我会收到以下消息消息：的元素引用已过时；要么元素不再附加到DOM，要么它不在当前的框架上下文中，要么文档已经刷新。更奇怪的是，它看起来确实从第二页拉了10个链接，但是上面的错误message@Nathan：Poosibly元素在检索时不可见。我没有测试过，因为我没有编辑器来测试代码。请尝试使用

presence\u of_all\u elements\u located（）

而不是

visibility\u of_all\u elements\u located（）

，并告诉我这是如何实现的goes@Nathan：单击“下一步”按钮后，您还需要提供一些延迟，以便加载页面。我刚刚添加了最新代码，因为它仍在第一页上循环。代码在我的原始答案中。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC