在访问第一个元素(Webscraping Selenium Python)后,无法通过循环中的XPath访问其余元素
我试图从sciencedirect网站上搜集数据。 我试图通过创建一个XPath列表并循环访问一个接一个的期刊来自动化结疤过程。 当im运行循环时,在访问第一个日志后,im无法访问其余元素。 这个过程在另一个网站上对我有效,但在这个网站上不起作用 我还想知道,除了这个过程之外,还有没有更好的方法访问这些元素在访问第一个元素(Webscraping Selenium Python)后,无法通过循环中的XPath访问其余元素,python,selenium,selenium-webdriver,web-scraping,webdriverwait,Python,Selenium,Selenium Webdriver,Web Scraping,Webdriverwait,我试图从sciencedirect网站上搜集数据。 我试图通过创建一个XPath列表并循环访问一个接一个的期刊来自动化结疤过程。 当im运行循环时,在访问第一个日志后,im无法访问其余元素。 这个过程在另一个网站上对我有效,但在这个网站上不起作用 我还想知道,除了这个过程之外,还有没有更好的方法访问这些元素 #Importing libraries import requests import os import json from selenium import webdriver
#Importing libraries
import requests
import os
import json
from selenium import webdriver
import pandas as pd
from bs4 import BeautifulSoup
import time
import requests
from time import sleep
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#initializing the chromewebdriver|
driver=webdriver.Chrome(executable_path=r"C:/selenium/chromedriver.exe")
#website to be accessed
driver.get("https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues")
#generating the list of xpaths to be accessed one after the other
issues=[]
for i in range(0,20):
docs=(str(i))
for j in range(1,7):
sets=(str(j))
con=("//*[@id=")+('"')+("0-accordion-panel-")+(docs)+('"')+("]/section/div[")+(sets)+("]/a")
issues.append(con)
#looping to access one issue after the other
for i in issues:
try:
hat=driver.find_element_by_xpath(i)
hat.click()
sleep(4)
driver.back()
except:
print("no more issues",i)
要从sciencedirect网站中获取数据,您可以执行以下步骤:
- 首先打开所有的手风琴
- 然后在adjustant中使用Ctrl+
单击()打开每个问题
- 下一步,刮取所需内容
- 代码块:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.common.keys import Keys options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe') driver.get('https://www.sciencedirect.com/journal/journal-of-corporate-finance/issues') accordions = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.accordion-panel.js-accordion-panel>button.accordion-panel-title>span"))) for accordion in accordions: ActionChains(driver).move_to_element(accordion).click(accordion).perform() issues = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.anchor.js-issue-item-link.text-m span.anchor-text"))) windows_before = driver.current_window_handle for issue in issues: ActionChains(driver).key_down(Keys.CONTROL).click(issue).key_up(Keys.CONTROL).perform() WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) windows_after = driver.window_handles new_window = [x for x in windows_after if x != windows_before][0] driver.switch_to_window(new_window) WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a#journal-title>span"))) print(WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//h2"))).get_attribute("innerHTML")) driver.close() driver.switch_to_window(windows_before) driver.quit()
- 控制台输出:
Institutions, Governance and Finance in a Globally Connected Environment Volume 58 Corporate Governance in Multinational Enterprises . . .
工具书类 您可以在以下内容中找到一些相关的详细讨论:
。通过xpath()查找元素。