在Python中使用Selenium提取具有特定类的链接_Python_Python 2.7_Selenium Webdriver_Infinite Scroll_Html Content Extraction

在Python中使用Selenium提取具有特定类的链接

python python-2.7 selenium-webdriver

在Python中使用Selenium提取具有特定类的链接,python,python-2.7,selenium-webdriver,infinite-scroll,html-content-extraction,Python,Python 2.7,Selenium Webdriver,Infinite Scroll,Html Content Extraction,我试图从一个无限的卷轴中提取链接这是我向下滚动页面的代码 driver = webdriver.Chrome('C:\\Program Files (x86)\\Google\\Chrome\\chromedriver.exe') driver.get('http://seekingalpha.com/market-news/top-news') for i in range(0,2): driver.implicitly_wait(15) driver.execute

我试图从一个无限的卷轴中提取链接

这是我向下滚动页面的代码

driver = webdriver.Chrome('C:\\Program Files     (x86)\\Google\\Chrome\\chromedriver.exe')
driver.get('http://seekingalpha.com/market-news/top-news')
for i in range(0,2):
    driver.implicitly_wait(15)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(20)

我的目标是从这个页面中提取特定的链接。使用class=“market\u current\u title”和HTML，如下所示：

<a class="market_current_title" href="/news/3223955-dow-wraps-best-week-since-2011-s-and-p-strongest-week-since-2014" sasource="titles_mc_top_news" target="_self">Dow wraps up best week since 2011; S&amp;P in strongest week since 2014</a>

我最后的错误是“stale元素引用：元素未附加到页面文档”。然后我试着

 URL = driver.find_elements_by_xpath("//div[@id='a']//a[@class='market_current_title']")

但是它说没有这样的联系！！！

您对解决此问题有何想法？

您可能正在尝试与已更改的元素交互（可能是滚动和屏幕外的元素）。就如何克服这一问题，尝试一些好的选择

下面是一个片段：

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
import selenium.webdriver.support.expected_conditions as EC
import selenium.webdriver.support.ui as ui

# return True if element is visible within 2 seconds, otherwise False
def is_visible(self, locator, timeout=2):
try:
    ui.WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, locator)))
    return True
except TimeoutException:
    return False

感谢Mashisho，这些答案主要用Java、Java脚本和C#实现。我无法在Python中获得相同的解决方案。这看起来很好，我尝试了这样的方式：“elements=wait.until（driver.find_elements_by_class_name（'market_current_title'）”），但我得到了一个错误：“list”对象不可调用”。太奇怪了！！！我想知道，如果每次我向下滚动页面，抓取链接，再次向下滚动并获得新的链接，在这方面，我认为我不应该面对这个问题。你知道怎么做吗？出现“list object..”错误可能是因为你不能等待。在列表中的之前，对每个元素（而不是元素）执行该操作，然后再次滚动并获取新链接，直到滚动结束。使用while循环。顺便说一句，如果我的答案有帮助，你可以投票…谢谢你Moshisho，当然你的答案非常有帮助，因为我是编程新手，你能给我一些提示，作为如何滚动和每次获取链接的代码吗？我正在执行“范围（0,2）内的I”驱动程序。隐式等待（15）驱动程序。执行脚本（“window.scrollTo（0，document.body.scrollHeight）”向下滚动，我不知道如何做一次并获取链接，然后继续。

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
import selenium.webdriver.support.expected_conditions as EC
import selenium.webdriver.support.ui as ui

# return True if element is visible within 2 seconds, otherwise False
def is_visible(self, locator, timeout=2):
try:
    ui.WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, locator)))
    return True
except TimeoutException:
    return False