python机械化检查网站上考试的日期/时间
我正在尝试使用Python mechanize检查考试的日期/时间可用性,如果结果中有特定日期/时间可用,则向某人发送电子邮件(附结果页截图) 我能够得到响应,但由于某种原因,我的表中的数据丢失 以下是初始html页面中的按钮python机械化检查网站上考试的日期/时间,python,web-scraping,mechanize,Python,Web Scraping,Mechanize,我正在尝试使用Python mechanize检查考试的日期/时间可用性,如果结果中有特定日期/时间可用,则向某人发送电子邮件(附结果页截图) 我能够得到响应,但由于某种原因,我的表中的数据丢失 以下是初始html页面中的按钮 <input type="submit" value="Get Exam List" name="B1"> <input type="button" value="Clear" name="B2" onclick="clear_entries()">
<input type="submit" value="Get Exam List" name="B1">
<input type="button" value="Clear" name="B2" onclick="clear_entries()">
<input type="hidden" name="action" value="GO">
更新-将其修改为使用selenium,不久将尝试进一步改进
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import urllib3
myURL = "http://secure.dre.ca.gov/PublicASP/CurrentExams.asp"
browser = webdriver.Firefox() # Get local session of firefox
browser.get(myURL) # Load page
element = browser.find_element_by_id("Checkbox5")
element.click()
element = browser.find_element_by_id("Checkbox13")
element.click()
element = browser.find_element_by_name("B1")
element.click()
5年后,也许这可以帮助某人。我把你的问题当作训练。我使用Requests包完成了它。(我使用python 3.9) 以下代码分为两部分:
- 检索POST请求后注入表中的数据的请求
## the request part url = "https://secure.dre.ca.gov/PublicASP/CurrentExams.asp" headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"} params = { "cb_examSites": [ "'Fresno'", "'Los+Angeles'", "'SF/Oakland'", "'Sacramento'", "'San+Diego'" ], "cb_examTypes": [ "'Broker'", "'Salesperson'" ], "B1": "Get+Exam+List", "action": "GO" } s = rq.Session() r = s.get(url, headers=headers) s.headers.update({"Cookie": "%s=%s" % (r.cookies.keys()[0], r.cookies.values()[0])}) r2 = s.post(url=url, data=params) soup = bs(r2.content, "lxml") # contain data you want
- 解析响应(很多方法都有可能有点乏味)
“d”包含您需要的数据。我还没有发送电子邮件。该网站可能正在使用JavaScript呈现页面,而BeautifulSoup不知道如何运行该页面,您需要使用类似Selenium的东西将页面加载到实际的browserthx max中,我将尝试一下!
soup.findAll('table')[0].findAll('tr')
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import urllib3
myURL = "http://secure.dre.ca.gov/PublicASP/CurrentExams.asp"
browser = webdriver.Firefox() # Get local session of firefox
browser.get(myURL) # Load page
element = browser.find_element_by_id("Checkbox5")
element.click()
element = browser.find_element_by_id("Checkbox13")
element.click()
element = browser.find_element_by_name("B1")
element.click()
## the request part
url = "https://secure.dre.ca.gov/PublicASP/CurrentExams.asp"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"}
params = {
"cb_examSites": [
"'Fresno'",
"'Los+Angeles'",
"'SF/Oakland'",
"'Sacramento'",
"'San+Diego'"
],
"cb_examTypes": [
"'Broker'",
"'Salesperson'"
],
"B1": "Get+Exam+List",
"action": "GO"
}
s = rq.Session()
r = s.get(url, headers=headers)
s.headers.update({"Cookie": "%s=%s" % (r.cookies.keys()[0], r.cookies.values()[0])})
r2 = s.post(url=url, data=params)
soup = bs(r2.content, "lxml") # contain data you want
table = soup.find_all("table", class_="General_list")[0]
titles = [el.text for el in table.find_all("strong")]
def beetweenBr(soupx):
final_str = []
for br in soupx.findAll('br'):
next_s = br.nextSibling
if not (next_s and isinstance(next_s,NavigableString)):
continue
next2_s = next_s.nextSibling
if next2_s and isinstance(next2_s,Tag) and next2_s.name == 'br':
text = str(next_s).strip()
if text:
final_str.append(next_s.strip())
return "\n".join(final_str)
d = {}
trs = table.find_all("tr")
for tr in trs:
tr_text = tr.text
if tr_text in titles:
curr_title = tr_text
splitx = curr_title.split(" - ")
area, job = splitx[0].split(" ")[0], splitx[1].split(" ")[0]
if not job in d.keys():
d[job] = {}
if not area in d[job].keys():
d[job][area] = []
continue
if (not tr_text in titles) & (tr_text != "DateBegin TimeLocationScheduledCapacity"):
tds = tr.find_all("td")
sub = []
for itd, td in enumerate(tds):
if itd == 2:
sub.append(beetweenBr(td))
else :
sub.append(td.text)
d[job][area].append(sub)