Selenium（python）在抓取多个页面时崩溃（50+；）_Python_Selenium_Web Scraping

Selenium（python）在抓取多个页面时崩溃（50+；）

python selenium web-scraping

Selenium（python）在抓取多个页面时崩溃（50+；）,python,selenium,web-scraping,Python,Selenium,Web Scraping,我有下面的脚本（如下）来清除掉跟踪的键数据（代码中的链接）。这是此的修改版本：它可以在50-100页上运行，但当我尝试刮除所有~7600页时失败。脚本在此失败：bond=[tablerow.text]并引发以下错误： StaleElementReferenceException:消息：的元素引用已过时；元素不再附加到DOM，不在当前帧上下文中，或者文档已刷新我添加了一个显式的等待tablerows，认为有些表需要更长的时间才能加载，但它似乎没有帮助，因为问题仍然存在。我尝试过其他几种方法，但

我有下面的脚本（如下）来清除掉跟踪的键数据（代码中的链接）。这是此的修改版本：

它可以在50-100页上运行，但当我尝试刮除所有~7600页时失败。脚本在此失败：

bond=[tablerow.text]

并引发以下错误：

StaleElementReferenceException:消息：的元素引用已过时；元素不再附加到DOM，不在当前帧上下文中，或者文档已刷新

我添加了一个显式的等待

tablerows

，认为有些表需要更长的时间才能加载，但它似乎没有帮助，因为问题仍然存在。我尝试过其他几种方法，但我已经没有主意了

任何关于如何解决这个问题的想法都会很有帮助。此外，欢迎提供任何加速代码的提示。谢谢

更新：以下来自KunduK+的建议增加

时间。睡眠（0.8）

到

时间。在for
循环中的睡眠（1.5）

似乎已解决问题。但是，在接受昆都克的答案之前，我会等待一段时间，以防其他人想出更好的答案

# TRACE Bond Scraper
import os
import time
import numpy as np
import pandas as pd
from datetime import date
from datetime import datetime as dt
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.firefox.options import Options

options = Options()
options.headless = False
driver = webdriver.Firefox(options = options)
driver.get('http://finra-markets.morningstar.com/BondCenter/Results.jsp')

# Click agree, edit search and submit 
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(
    (By.CSS_SELECTOR, ".button_blue.agree"))).click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(
    (By.CSS_SELECTOR, 'a.qs-ui-btn.blue'))).click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(
    (By.CSS_SELECTOR, 'a.ms-display-switcher.hide'))).click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(
    (By.CSS_SELECTOR, 'input.button_blue[type=submit]'))).click()
WebDriverWait(driver, 10).until(EC.presence_of_element_located(
    (By.CSS_SELECTOR, '.rtq-grid-row.rtq-grid-rzrow .rtq-grid-cell-ctn')))
headers = [title.text for title in driver.find_elements_by_css_selector(
    '.rtq-grid-row.rtq-grid-rzrow .rtq-grid-cell-ctn')[1:]]

# Find out the total number of pages to scrape
pg_no = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
            (By.CSS_SELECTOR, '.qs-pageutil-total > span:nth-child(1)'))).text
pg_no = int(pg_no)

# Scrape tables
bonds = []
for page in range(1, pg_no):
    WebDriverWait(driver, 10).until(EC.presence_of_element_located(
        (By.CSS_SELECTOR, (f"a.qs-pageutil-btn.on[value='{str(page)}']"))))
    time.sleep(0.8)
    tablerows = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located(
        (By.CSS_SELECTOR, 'div.rtq-grid-bd > div.rtq-grid-row')))
    for tablerow in tablerows:
        bond = [tablerow.text]
        bonds.append(bond)
    WebDriverWait(driver, 10).until(EC.presence_of_element_located(
        (By.CSS_SELECTOR, ('a.qs-pageutil-next')))).click()

将此行从

WebDriverWait(driver, 10).until(EC.presence_of_element_located(
        (By.CSS_SELECTOR, (f"a.qs-pageutil-btn.on[value='{str(page)}']"))))

这将删除上的类

WebDriverWait(driver, 10).until(EC.presence_of_element_located(
        (By.CSS_SELECTOR, (f"a.qs-pageutil-btn[value='{str(page)}']"))))

更改为

chrome

而不是

firefox

希望您能获得更好的输出。它工作了400多页，但我没有进一步检查，但希望对所有页面都有效。谢谢，我会尝试。我从这行

a.qs-pageutil-btn.on[value='{str（page）}']

中删除了“.on”，这使代码更加稳定。现在它在第500页左右崩溃。那么你的意思是在移动到chrome浏览器后，在500页后出现错误？我还没有尝试chrome，但我想你也建议（后来删除）删除“.on”从上面的代码行，这帮助我在仍然使用Firefox的情况下访问了第600页。尝试chrome浏览器希望您能获得更好的输出？请尝试一下。