将web刮表放入excel(selenium、python)
我想把我拼凑的表格及其标题输入excel。我尝试了多种方法,但我似乎不知道如何在excel中正确显示它。下面还有一张图片,显示了我希望它在理想情况下的显示方式。先谢谢你将web刮表放入excel(selenium、python),python,excel,selenium,selenium-webdriver,web-scraping,Python,Excel,Selenium,Selenium Webdriver,Web Scraping,我想把我拼凑的表格及其标题输入excel。我尝试了多种方法,但我似乎不知道如何在excel中正确显示它。下面还有一张图片,显示了我希望它在理想情况下的显示方式。先谢谢你 from selenium import webdriver from selenium.webdriver.support.ui import Select from selenium.webdriver.common.keys import Keys from selenium.webdriver.support.ui im
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome("drivers/chromedriver")
driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "state")))).select_by_visible_text("New Hampshire")
driver.find_element_by_xpath("//input[@id='city']").send_keys("Moultonborough")
driver.find_element_by_xpath("//input[@id='name']").send_keys("Moultonborough Academy")
driver.find_element_by_xpath("//input[@value='Search']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='hsCode']"))).click()
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='approvedCourseTable_1']//th[@class='header']")))])
table = ([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#approvedCourseTable_1.tablesorter")))])
with open('out.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(table)
出于某种原因,在使用表#approvedCourseTable_1时,将表刮到excel。tablesorter仅显示“课程”,仅此而已。当我将标题和表格内容分开时,我可以将它们分别刮到excel中,但不能一起刮。此外,当我设法将表格内容刮到excel时,表格内容没有正确排列
x = ([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#approvedCourseTable_1 th.header")))])
y = ([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#approvedCourseTable_1 td")))])
如果可能的话,我希望它可以这样显示:我使用Selenium/Python完成了这项工作。尝试下面的代码示例
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import csv
csvFile = open('out.csv', 'w')
writer = csv.writer(csvFile)
driver = webdriver.Chrome("drivers/chromedriver")
driver.get("https://web3.ncaa.org/hsportal/exec/hsAction")
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "state")))).select_by_visible_text("New Hampshire")
driver.find_element_by_xpath("//input[@id='city']").send_keys("Moultonborough")
driver.find_element_by_xpath("//input[@id='name']").send_keys("Moultonborough Academy")
driver.find_element_by_xpath("//input[@value='Search']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@name='hsCode']"))).click()
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='approvedCourseTable_1']//th[@class='header']")))])
#table = ([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#approvedCourseTable_1.tablesorter")))])
table_header = driver.find_element_by_xpath("(//table[@id='NcaaCrs_ApprovedCategory_All']//td[@class='hs_tableHeader'])[1]")
print(table_header.text)
writer.writerow(table_header.text)
#Find All Approved Categories
approved_Categories = driver.find_elements_by_xpath("//div[contains(@id,'NcaaCrs_ApprovedCategory_')]")
for i in range(len(approved_Categories)):
cateogry_header = driver.find_element_by_xpath("//div[contains(@id,'NcaaCrs_ApprovedCategory_"+str(i+1)+"')]//td[@class='hs_tableHeader']")
print(cateogry_header.text)
writer.writerow(cateogry_header.text)
#Find Course table header and rows
course_headers = driver.find_elements_by_xpath("//table[contains(@id,'approvedCourseTable_"+str(i+1)+"')]/thead//th")
header_val = []
for headers in course_headers:
header_val.append(headers.text)
print(header_val)
writer.writerow(header_val)
course_rows = driver.find_elements_by_xpath("//table[@id='approvedCourseTable_"+str(i+1)+"']//tbody/tr")
for j in range(len(course_rows)):
row_values = driver.find_elements_by_xpath("//table[@id='approvedCourseTable_"+str(i+1)+"']//tbody/tr["+str(j+1)+"]/td")
row_val = []
for row in row_values:
row_val.append(row.text)
print(row_val)
writer.writerow(row_val)
csvFile.close()
driver.quit()
CSV输出将是这样的
['Course\nWeight', 'Title', 'Notes', 'Max\nCredits', 'OK\nThrough', 'Disability\nCourse']
Approved Courses
English
['Course\nWeight', 'Title', 'Notes', 'Max\nCredits', 'OK\nThrough', 'Disability\nCourse']
['', 'AFRICAN LITERATURE', '', '', '', 'No']
['', 'AMERICAN LITERATURE', '', '', '', 'No']
['', 'AP ENGLISH LANGUAGE & COMPOSITION', '', '', '', 'No']
['', 'AP ENGLISH LITERATURE & COMPOSITION', '', '', '', 'No']
['', 'COLLEGE COMPOSITION', '', '', '', 'No']
['', 'ENGLISH 9 (ENG 091/092/093)', '', '', '', 'No']
['', 'ENGLISH 9/H', '', '', '', 'No']
['', 'PUBLIC SPEAKING', '', '', '', 'No']
['', 'WORLD STUDIES', '', '', '', 'No']
['', 'WORLD STUDIES HBC', '', '', '', 'No']
Social Science
['Course\nWeight', 'Title', 'Notes', 'Max\nCredits', 'OK\nThrough', 'Disability\nCourse']
['', 'AP WORLD HISTORY', '', '', '', 'No']
['', 'ECONOMICS', '', '', '', 'No']
['', 'GOVERNMENT', '', '', '', 'No']
['', 'PSYCHOLOGY', '', '', '', 'No']
['', 'US HISTORY', '', '', '', 'No']
['', 'US HISTORY/AP', '', '', '', 'No']
['', 'WORLD STUDIES', '', '', '', 'No']
['', 'WORLD STUDIES HBC', '', '', '', 'No']
我认为更简单的方法是在Excel中创建一个模板,根据需要包含第1到第4行,并将所有数据传递到
pandas
dataframe,这样您就可以轻松实现所需的内容。因此,我是否必须再次使用pandas和dataframe在web上刮除所有内容?不,您只需要将表
变量解析为pandas
dataframe
,您可以在这里查看一些信息:我正在查看pandas,但我确实设法进入了excel。但是,它没有正确对齐。也许你知道?