Python 从新闻网站抓取标题，无限加载_Python_Selenium Webdriver_Web Scraping

Python 从新闻网站抓取标题，无限加载

python selenium-webdriver web-scraping

Python 从新闻网站抓取标题，无限加载,python,selenium-webdriver,web-scraping,Python,Selenium Webdriver,Web Scraping,我想从这个网站上摘取头条：我需要加载早期新闻，所以点击蓝色按钮“查看更多”是必要的我创建了此代码，但不起作用： from bs4 import BeautifulSoup import time from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdr

我想从这个网站上摘取头条：

我需要加载早期新闻，所以点击蓝色按钮“查看更多”是必要的

我创建了此代码，但不起作用：

from bs4 import BeautifulSoup
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
u = 'https://www.marketwatch.com/latest-news?mod=top_nav' #US Business


driver = webdriver.Chrome(executable_path=r"C:/chromedriver.exe")
driver.maximize_window()
driver.get(u)
time.sleep(10)
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CLASS_NAME,'close-btn'))).click()
time.sleep(10)

driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
for i in range(3):
        element =WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,'component.component--module.more-headlines div.group.group--buttons.cover > a.btn.btn--secondary.js--more-headlines)))
        driver.execute_script("arguments[0].scrollIntoView();", element)
        element.click()
        time.sleep(5)
        driver.execute_script("arguments[0].scrollIntoView();", element)

        print(f'click {i} done')
soup = BeautifulSoup(driver.page_source, 'html.parser')

driver.quit()

它返回以下错误：

raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

这样的东西会更可靠：

for i in range(3):
  driver.execute_script('''
    document.querySelector('a.js--more-headlines').click()
  ''')
  time.sleep(1)

注意，当您从javascript中单击时，不必滚动到视图中