Python 如何使用Selenium为所有页面选择选项卡和刮取结果?
我已经创建了以下功能来从网站中获取结果,我想知道如何:Python 如何使用Selenium为所有页面选择选项卡和刮取结果?,python,selenium,Python,Selenium,我已经创建了以下功能来从网站中获取结果,我想知道如何: 首先单击“Tables(8899)”选项卡,然后仅从那里刮取结果 现在它只抓取第一个页面,我如何抓取所有页面并将它们附加到一个数据帧中,而不必指定页面数 功能: from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import ex
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium import webdriver
from functools import reduce
def stats_canada():
driver = webdriver.Chrome('/Users/wwds/Desktop/chromedriver')
driver.get('https://www150.statcan.gc.ca/n1/en/type/data?count=100&p=-All%2C5-data/tables#all')
elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all a[target='_self']")))
linkTitles = pd.DataFrame([title.text for title in elements]).rename(columns = {0 : 'Name'})
links = pd.DataFrame([link.get_attribute("href") for link in elements]).rename(columns = {0 : 'Link'})
elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all span[class='ndm-result-date']")))
release_date = pd.DataFrame([date.text for date in elements]).rename(columns = {'0' : 'Release Date'}).rename(columns = {0 : 'Release Date'})
elements = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#all div[class='ndm-result-productid']")))
table_id = pd.DataFrame([table.text for table in elements]).rename(columns = {0 : 'Table ID'})
table_id['Table ID'] = table_id['Table ID'].str.replace("Table: ", "")
data = reduce(lambda x,y: pd.merge(x, y, left_index = True, right_index = True), [linkTitles, links, release_date, table_id])
return data
stats_canada()
提前谢谢 首先,您有“Tables(8899)”选项卡的id,您必须单击它。为此,你可以使用捕鸟器-
elem = driver.find_element_by_id('tables-lnk')
elem.click()
time.sleep(10) #this delay is for loading the page
现在,您必须使用您熟悉的selenium或beautiful soup从该页面中删除所有条目,并将它们添加到数据框中
然后,您必须单击页面下方的“下一步”按钮。您可以找到按钮id,然后按上面的方法单击按钮