Python 如何在一个webtable中打开多个HREF来浏览selenium

Python 如何在一个webtable中打开多个HREF来浏览selenium,python,selenium,selenium-webdriver,web-scraping,webdriver,Python,Selenium,Selenium Webdriver,Web Scraping,Webdriver,我正在尝试使用python和selenium来清理这个网站。但是,我需要的所有信息都不在主页上,那么我如何逐个单击“申请编号”列中的链接,进入该页面?刮取信息,然后返回原始页面 我试过: def getData(): data = [] select = Select(driver.find_elements_by_xpath('//*[@id="node-41"]/div/div/div/div/div/div[1]/table/tbody/tr/td/a/@href')) lis

我正在尝试使用python和selenium来清理这个网站。但是,我需要的所有信息都不在主页上,那么我如何逐个单击“申请编号”列中的链接,进入该页面?刮取信息,然后返回原始页面

我试过:

def getData():
  data = []
  select = Select(driver.find_elements_by_xpath('//*[@id="node-41"]/div/div/div/div/div/div[1]/table/tbody/tr/td/a/@href'))
  list_options = select.options
  for item in range(len(list_options)):
    item.click()
  driver.get(url)
网址:

网站截图:
您可以执行以下操作:

import selenium
from selenium.webdriver.common.keys import Keys
from selenium import Webdriver
import time

url = "url"
browser = Webdriver.Chrome() #or whatever driver you use
browser.find_element_by_class_name("views-field views-field-title").click()
# or use this browser.find_element_by_xpath("xpath")
#Note you will need to change the class name to click a different item in the table
    time.sleep(5) # not the best way to do this but its simple. Just to make sure things load
#it is here that you will be able to scrape the new url I will not post that as you can scrape what you want. 
# When you are done scraping you can return to the previous page with this
driver.execute_script("window.history.go(-1)")

希望这就是您要查找的内容。

当您导航到新页面时,DOM将刷新,您不能在此处使用列表方法。下面是我执行此操作的方法(我不会用python编写太多代码,因此语法和索引可能会被破坏)

count=driver.find_elements_通过xpath(//table[@class='views-table cols-6']/tbody/tr)来计算链接总数
蓝(计数)
j=1

如果j要在一个webtable中打开多个HREF来浏览selenium,可以使用以下解决方案:

  • 代码块:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
    
      hrefs = []
      options = Options()
      options.add_argument("start-maximized")
      options.add_argument("disable-infobars")
      options.add_argument("--disable-extensions")
      options.add_argument("--disable-gpu")
      options.add_argument("--no-sandbox")
      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
      driver.get('http://www.scilly.gov.uk/planning-development/planning-applications')
      windows_before  = driver.current_window_handle # Store the parent_window_handle for future use
      elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.views-field.views-field-title>a"))) # Induce WebDriverWait for the visibility of the desired elements
      for element in elements:
          hrefs.append(element.get_attribute("href")) # Collect the required href attributes and store in a list
      for href in hrefs:
          driver.execute_script("window.open('" + href +"');") # Open the hrefs one by one through execute_script method in a new tab
          WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) # Induce  WebDriverWait for the number_of_windows_to_be 2
          windows_after = driver.window_handles
          new_window = [x for x in windows_after if x != windows_before][0] # Identify the newly opened window
          # driver.switch_to_window(new_window) <!---deprecated>
          driver.switch_to.window(new_window) # switch_to the new window
          # perform your webscraping here
          print(driver.title) # print the page title or your perform your webscraping
          driver.close() # close the window
          # driver.switch_to_window(windows_before) <!---deprecated>
          driver.switch_to.window(windows_before) # switch_to the parent_window_handle
      driver.quit() #Quit your program
    

工具书类 您可以在以下内容中找到一些相关的详细讨论:


是否在新选项卡或同一窗口中打开链接?另外,向我们展示您迄今为止尝试过的内容。不,它不会打开新选项卡,而是在同一窗口中打开它,我进行了编辑以显示我尝试过的内容检查下面的答案和驱动程序。通过xpath(“table[@class='views-table cols-6']/tbody/tr[“+j+”]/td/a”)查找元素。单击()类型错误:现在只能使用更新的代码将str(而不是“int”)再次连接到stry。我已经把str添加到j。请注意此处的更改str(j)它可以工作,谢谢,但如果您可以发表评论,将有助于更好地理解代码?@LibanWest使用所需的评论更新了解决方案,以方便您的使用Hello如果您有空,我有问题,我可以使用您的帮助哦,对不起,我以为我这样做了,但看起来我刚刚接受了:)修复了它
  from selenium import webdriver
  from selenium.webdriver.chrome.options import Options
  from selenium.webdriver.support.ui import WebDriverWait
  from selenium.webdriver.common.by import By
  from selenium.webdriver.support import expected_conditions as EC

  hrefs = []
  options = Options()
  options.add_argument("start-maximized")
  options.add_argument("disable-infobars")
  options.add_argument("--disable-extensions")
  options.add_argument("--disable-gpu")
  options.add_argument("--no-sandbox")
  driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
  driver.get('http://www.scilly.gov.uk/planning-development/planning-applications')
  windows_before  = driver.current_window_handle # Store the parent_window_handle for future use
  elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.views-field.views-field-title>a"))) # Induce WebDriverWait for the visibility of the desired elements
  for element in elements:
      hrefs.append(element.get_attribute("href")) # Collect the required href attributes and store in a list
  for href in hrefs:
      driver.execute_script("window.open('" + href +"');") # Open the hrefs one by one through execute_script method in a new tab
      WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) # Induce  WebDriverWait for the number_of_windows_to_be 2
      windows_after = driver.window_handles
      new_window = [x for x in windows_after if x != windows_before][0] # Identify the newly opened window
      # driver.switch_to_window(new_window) <!---deprecated>
      driver.switch_to.window(new_window) # switch_to the new window
      # perform your webscraping here
      print(driver.title) # print the page title or your perform your webscraping
      driver.close() # close the window
      # driver.switch_to_window(windows_before) <!---deprecated>
      driver.switch_to.window(windows_before) # switch_to the parent_window_handle
  driver.quit() #Quit your program
  Planning application: P/18/064 | Council of the ISLES OF SCILLY
  Planning application: P/18/063 | Council of the ISLES OF SCILLY
  Planning application: P/18/062 | Council of the ISLES OF SCILLY
  Planning application: P/18/061 | Council of the ISLES OF SCILLY
  Planning application: p/18/059 | Council of the ISLES OF SCILLY
  Planning application: P/18/058 | Council of the ISLES OF SCILLY
  Planning application: P/18/057 | Council of the ISLES OF SCILLY
  Planning application: P/18/056 | Council of the ISLES OF SCILLY
  Planning application: P/18/055 | Council of the ISLES OF SCILLY
  Planning application: P/18/054 | Council of the ISLES OF SCILLY