Python Selenium错误：元素不再附加到DOM_Python_Selenium_Dom_Beautifulsoup_Phantomjs

Python Selenium错误：元素不再附加到DOM

python selenium dom phantomjs

Python Selenium错误：元素不再附加到DOM,python,selenium,dom,beautifulsoup,phantomjs,Python,Selenium,Dom,Beautifulsoup,Phantomjs,我正在尝试使用Selenium（通过PhantomJS实现无头）和BeautifulSoup获得代理列表：这就是我目前所做的： from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.PhantomJS() driver.set_window_size(1120, 550) driver.get("https://sslproxies.org/") while Tru

我正在尝试使用Selenium（通过PhantomJS实现无头）和BeautifulSoup获得代理列表：

这就是我目前所做的：

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get("https://sslproxies.org/")

while True:
    try:
        next_button = driver.find_element_by_xpath("//li[@class='paginate_button next'][@id='proxylisttable_next']")
    
    except:
        break
    next_button.click()

    soup = BeautifulSoup(next_button.get_attribute('innerHTML'),'html.parser')

但我得到了这个错误：

"errorMessage":"Element is no longer attached to the DOM"

您正在定义

next_按钮

，然后单击所述按钮，然后再次尝试引用

next_按钮

变量。您的单击导致您导航到另一个具有全新DOM的页面，并且您对

next_按钮的定义不再有效。为了避免这种情况，您可以简单地重新定义变量或始终使用整个变量
driver.find_element_by_xpath("//li[@class='paginate_button next'][@id='proxylisttable_next']")

1您可以使用for循环遍历页面，但为此，您需要获得页面数。根据站点获取的页数，方法可能会有所不同。你的情况很简单。
您将获得页面定位器列表的长度+1，如下所示：len（driver.find_elements_by_xpath（“//li[@class='paginate_button']）
）
2您的定位器不正确，因此我将其更改为：/li[@id='proxylisttable\u next']/a
（添加了/a
）
3找到按钮后，在中单击它，最后单击
解决方案
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.implicitly_wait(10)
driver.set_window_size(1120, 550)
driver.get("https://sslproxies.org/")

wait = WebDriverWait(driver, 10)
length = len(driver.find_elements_by_xpath("//li[@class='paginate_button ']"))
print(f"List length is: {length}")
for j in range(1, length+1):
    try:
        print("Clicking Page " + str(j+1))
        wait.until(
            EC.visibility_of_element_located((By.CSS_SELECTOR, "section[id='list']")))
        wait.until(EC.element_to_be_clickable((By.XPATH, "//li[@class='paginate_button next'][@id='proxylisttable_next']/a")))
    finally:
        next_button = driver.find_element_by_xpath(
            "//li[@class='paginate_button next'][@id='proxylisttable_next']/a")
        next_button.click()

另外，我在Chrome上测试了它，但它应该可以在任何浏览器中使用，因为我使用稳定的定位器和等待
我的调试输出：
List length is: 4
Clicking Page 2
Clicking Page 3
Clicking Page 4
Clicking Page 5

我用for
循环解决了这个问题，但在当前情况下，它不会只刮最后一页，会得到80个结果。我可以发布它，你将自己解决如何刮最后一页的问题。