如何保留原始页面'；打开javascript生成的链接并返回到原始页面后，使用selenium替换元素_Javascript_Python_Selenium_Web Scraping_Selenium Chromedriver

如何保留原始页面'；打开javascript生成的链接并返回到原始页面后，使用selenium替换元素

javascript python selenium web-scraping

如何保留原始页面'；打开javascript生成的链接并返回到原始页面后，使用selenium替换元素,javascript,python,selenium,web-scraping,selenium-chromedriver,Javascript,Python,Selenium,Web Scraping,Selenium Chromedriver,通过javascript生成的链接移动另一个页面后，在selenium的webdriver中保留原始元素似乎是不可能或非常复杂的。我该怎么做我正在尝试使用以下组件对特定网页进行web抓取： Ubuntu 18.04.1 LTS Python 3.6.1 Selenium（Python包）3.141.0 谷歌浏览器71.0.3578.98 ChromeDriver 2.45.615279 该网页包含“href”为javascript函数的链接，如下所示： <a href="javasc

通过javascript生成的链接移动另一个页面后，在selenium的webdriver中保留原始元素似乎是不可能或非常复杂的。我该怎么做

我正在尝试使用以下组件对特定网页进行web抓取：

Ubuntu 18.04.1 LTS
Python 3.6.1
Selenium（Python包）3.141.0
谷歌浏览器71.0.3578.98
ChromeDriver 2.45.615279

该网页包含“href”为javascript函数的链接，如下所示：

<a href="javascript:funcName(10, 24, 100)"></a>

我做了什么来保留原始页面的元素，但没有成功：

1.复制元素（或驱动程序）对象

我尝试了驱动程序本身的deepcopy，但也不起作用。返回的错误为

TypeError: can't pickle _thread.lock objects

2.在新选项卡中打开重定向页面

from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys

for a in driver.find_elements_by_css_selector(.some-class-name):

    action = ActionChains(driver)

    # Expected result is the following open the redirected page in a new tab, and CONTROL + TAB changes between tabs
    action.key_down(Keys.CONTROL).click(a).key_down(Keys.CONTROL).perform()  
    driver.send_keys(Keys.CONTROL + Keys.TAB)

但是，这并没有打开一个新的选项卡，只是移动到同一选项卡中的重定向页面

如果没有简单的方法，我会创建一个列表或字典对象来存储我已经刮取的链接，每次刮取重定向页面后，我都会再次解析原始页面并跳过已经检查的链接。但我不想这样做，因为这是非常多余的

我选择了一种创建另一个webdriver实例的方法

driver = webdriver.Chrome()
driver_sub = webdriver.Chrome()

driver.get(url)
driver_sub.get(url)  # access the same page with different instance

for a in driver.find_elements_by_css_selector('.some-class-name'):
    script = a.get_attribute('href')
    driver_sub.execute_script(script)
    # do some work on the redirected page with driver_sub
    driver_sub.execute_script('window.history.go(-1)')  # this is almost same as driver_sub.back()

即使您返回相同的页面，但selenium不知道它是相同的页面，selenium会将其视为新页面。在for循环之前找到的

链接不属于新页面。您需要在新页面上再次找到链接，并将它们分配给相同的变量links
inside for loop。使用索引迭代到下一个链接
links = driver.find_elements_by_css_selector(.some-class-name)

for i in range(0, len(links)):
    links[i].click()  # this redirects me to another page
    print(driver.current_url)  # this shows the redirected page
    driver.back()
    print(driver.current_url). 

    # Important: find the links again on the page back from redirected page
    # to resolve the StaleElementReferenceException.
    links = driver.find_elements_by_css_selector(.some-class-name)

我能够使用类似这样的方法在改变页面的同时迭代元素（灵感来自于yong的答案）
这将允许您保留一个索引来循环遍历元素，而实际上不必担心空引用
    downloadList = driver.find_elements_by_id('download-form')
    
    for i in range(0, len(downloadList)):
        downloadList[i].submit()
        time.sleep(15)
        driver.get("url")
        time.sleep(5)
        downloadList = driver.find_elements_by_id('download-form')
        time.sleep(20)

在java中，getCurrentUrl（）给出驱动程序处于活动状态的页面的URL，即使是因为单击而打开的新页面，驱动程序也不会移动到这些页面，因此getCurrentUrl不会给出该URL。您有测试URL吗？这似乎是我将使用的fetch
方法。@QHarr抱歉，我没有。@pguardiario是某个包的fetch方法吗？如果我理解正确的话，SeleniumWebDriver没有这样的方法。谢谢@yong。实际上，我的代码包含一些循环，所以在我的例子中，我需要找到每个循环使用的元素，这可能不是一个好主意。但是，在某些情况下，您的解决方案会有所帮助。忽略睡眠，这些都是针对我的特定用例的。
driver = webdriver.Chrome()
driver_sub = webdriver.Chrome()

driver.get(url)
driver_sub.get(url)  # access the same page with different instance

for a in driver.find_elements_by_css_selector('.some-class-name'):
    script = a.get_attribute('href')
    driver_sub.execute_script(script)
    # do some work on the redirected page with driver_sub
    driver_sub.execute_script('window.history.go(-1)')  # this is almost same as driver_sub.back()

links = driver.find_elements_by_css_selector(.some-class-name)

for i in range(0, len(links)):
    links[i].click()  # this redirects me to another page
    print(driver.current_url)  # this shows the redirected page
    driver.back()
    print(driver.current_url). 

    # Important: find the links again on the page back from redirected page
    # to resolve the StaleElementReferenceException.
    links = driver.find_elements_by_css_selector(.some-class-name)

    downloadList = driver.find_elements_by_id('download-form')
    
    for i in range(0, len(downloadList)):
        downloadList[i].submit()
        time.sleep(15)
        driver.get("url")
        time.sleep(5)
        downloadList = driver.find_elements_by_id('download-form')
        time.sleep(20)