在python中使用selenium滚动动态内容无效_Python_Selenium_Web Scraping

在python中使用selenium滚动动态内容无效

python selenium web-scraping

在python中使用selenium滚动动态内容无效,python,selenium,web-scraping,Python,Selenium,Web Scraping,->我想刮掉这一页。由于这是一个动态页面，我需要向下滚动到页面底部，然后获取HTML内容以将其删除。但是，当通过selenium chrome web驱动程序打开此网站时，当我向下滚动时，既不会手动也不会自动加载网站。当网站从普通的chrome浏览器上打开时，它工作得很好。我甚至尝试了firefox驱动程序，结果是一样的。这是我试用过的代码 driver = webdriver.Chrome(executable_path=r'C:/tools/drivers/chromedriver.exe'

->我想刮掉这一页。由于这是一个动态页面，我需要向下滚动到页面底部，然后获取HTML内容以将其删除。但是，当通过selenium chrome web驱动程序打开此网站时，当我向下滚动时，既不会手动也不会自动加载网站。当网站从普通的chrome浏览器上打开时，它工作得很好。我甚至尝试了firefox驱动程序，结果是一样的。这是我试用过的代码

driver = webdriver.Chrome(executable_path=r'C:/tools/drivers/chromedriver.exe')
driver.get('https://www.narendramodi.in/news')
# https://stackoverflow.com/a/27760083

SCROLL_PAUSE_TIME = 2.0
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)

while True:
    # Scroll down to bottom
    time.sleep(SCROLL_PAUSE_TIME)

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    print(new_height)
    if new_height == last_height:
        break
    last_height = new_height


res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()

soup = BeautifulSoup(res, 'lxml')

如何刮取整个页面？

您是否可以不使用selenium而只刮取填充页面的API？它看起来像是无限滚动，但您可以参考以下链接：