使用R从带有JavaScript按钮的ASP.NET网页中删除表
我滔滔不绝地问了很多相关的问题,但都无济于事。我需要根据我指定的日期和时间从ASP.NET网页()中获取价格信息表。我很习惯并希望使用R。我的基本障碍是URL不反映搜索参数,它是静态的,而且我也不知道如何在ASP.NET网站上提交包含Javascript的HTML表单 我查看了上面URL的源代码。我发现在一个iframe中有一个指向另一个页面的“源数据”链接:。我尝试基于此StackOverflow线程在R中执行POST请求: 此外,我只能获取从服务器返回的页面的HTML,该页面不包含所有结果。事实上,页面底部有JavaScript箭头按钮,允许我在网页中的所有结果之间进行制表 在网页本身中,要在从下拉菜单中选择后查看结果,我必须点击“查看”按钮。有没有一种方法可以在R中复制此参数,以将我的'03'参数作为查询发送到服务器,从而将新的HTML返回到网页使用R从带有JavaScript按钮的ASP.NET网页中删除表,javascript,asp.net,r,http,rcurl,Javascript,Asp.net,R,Http,Rcurl,我滔滔不绝地问了很多相关的问题,但都无济于事。我需要根据我指定的日期和时间从ASP.NET网页()中获取价格信息表。我很习惯并希望使用R。我的基本障碍是URL不反映搜索参数,它是静态的,而且我也不知道如何在ASP.NET网站上提交包含Javascript的HTML表单 我查看了上面URL的源代码。我发现在一个iframe中有一个指向另一个页面的“源数据”链接:。我尝试基于此StackOverflow线程在R中执行POST请求: 此外,我只能获取从服务器返回的页面的HTML,该页面不包含所有结果。
如果我能做到这一点,我也可以写一些东西来“推”页面箭头 你可以用硒来做这个。看见免责声明我是RSelenium软件包的作者。有关操作的基本示意图可在和上查看
require(RSelenium)
#RSelenium::startServer()#如果需要
remDr对于子孙后代,我还想在结果页面之间的页面点击中显示我正在使用的代码(没有“全部显示”选项)。我让RSelenium点击所有页面,直到不再有“向前点击”选项。在每个页面上,它将HTML表刮入一个列表:
# Get the first page of results
tableElem <- remDr$findElement(using = "id", "dgLIP")
tmp <- readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
hourlyData <- list()
# Save the first table without the last row, which is gibberish
hourlyData[[1]] <- tmp[[1]][-27,]
# Click the 'greater than' arrow javascript href element to get to next page
acc <- 2
while("javascript:__doPostBack('dgLIP$_ctl29$_ctl1','')" %in% unlist(lapply(remDr$findElements("css selector", "[href]"), function(x){x$getElementAttribute("href")}))) {
webElems <- remDr$findElements("css selector", "[href]")
clickers <- unlist(lapply(webElems, function(x){x$getElementAttribute("href")}))
pager <- webElems[[which(clickers == "javascript:__doPostBack('dgLIP$_ctl29$_ctl1','')")]]
pager$clickElement()
tableElem <- remDr$findElement(using = "id", "dgLIP")
tmp <- readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
hourlyData[[acc]] <- tmp[[1]]
acc <- acc + 1
Sys.sleep(3)
}
#获取结果的第一页
我希望其他人会给你更多乐观的理由,但我的建议是不要这样做。将python与selenium驱动程序一起使用,即使您事先不了解python,也会更容易。我这样说是因为我喜欢R,并试图用它来做任何事情,但在这种情况下,我认为它不是适合这份工作的工具。谢谢你,伊斯塔……我在进入这个小泡菜之前从未听说过硒。您认为使用Python驱动程序比下面jdharrison建议的R包更有优势吗?好的,我很感兴趣!星期一我要试一试我被困在'remDr$open()'获取错误'error in function(type,msg,asError=TRUE):无法连接到主机'。我使用devtools软件包安装在R中,并从GitHub下载。@sclarky您需要运行selenium服务器。请参阅RSelenium basics vignette。很抱歉@jdharison这对我来说是一门外语。我运行了checkserver和startserver函数,当我导航到RSelenium包中的bin文件夹时,selenium jar文件就在那里。但是我仍然会遇到错误,这可能是java的问题。您是否在该系统上安装了Java或Java+JDK?你可以检查如下1。转到命令提示符2。键入“java-version”
<td style="height: 42px; width: 77px;">
<span id="lblLIPHour">Hour</span><br><select name="ddlHour" id="ddlHour"><option value="1">01</option>
<option value="2">02</option>
<option selected value="3">03</option>
<option value="4">04</option>
<option value="5">05</option>
<option value="6">06</option>
<option value="7">07</option>
<option value="8">08</option>
> readHTMLTable(tableData[[1]])
Publish Date Price Date PNode Price Parent PNode Settlement Location
1 201402281552 201402281600 AECI 23.45 AECI AECI
2 201402281552 201402281600 AMRN 23.45 AMRN AMRN
3 201402281552 201402281600 BLKW 23.45 BLKW BLKW
4 201402281552 201402281600 CLEC 23.45 CLEC CLEC
5 201402281552 201402281600 CSWS_AECC_LA 23.45 CSWS_AECC_LA AECC_CSWS
require(RSelenium)
# RSelenium::startServer() # if needed
remDr <- remoteDriver()
remDr$open()
remDr$setImplicitWaitTimeout(3000)
remDr$navigate("http://www.spp.org/LIP.asp")
remDr$switchToFrame("content_frame")
dateElem <- remDr$findElement(using = "id", "txtLIPDate") # select the date
dateRequired <- "01/14/2014"
dateElem$clearElement()
dateElem$sendKeysToElement(list("01/14/2014", key = "enter")) # send a date to app
hourElem <- remDr$findElement(using = "css selector", '#ddlHour [value="5"]') # select the 5th hour
hourElem$clickElement() # select this hour
buttonElem <-remDr$findElement(using = "id", "cmdView")
buttonElem$clickElement() # click the view button
#Sys.sleep(5)
tableElem <- remDr$findElement(using = "id", "dgLIP")
readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
[1] "tableElem$getElementAttribute(\"outerHTML\")"
$dgLIP
V1 V2 V3 V4 V5 V6
1 Publish Date Price Date PNode Price Parent PNode Settlement Location
2 201401132252 201401132300 AECI 19.14 AECI AECI
3 201401132252 201401132300 AMRN 18.87 AMRN AMRN
4 201401132252 201401132300 BLKW 20.28 BLKW BLKW
5 201401132252 201401132300 CLEC 18.99 CLEC CLEC
6 201401132252 201401132300 CSWS_AECC_LA 19.77 CSWS_AECC_LA AECC_CSWS
7 201401132252 201401132300 CSWS_GREEN_LIGHT_LA 18.5 CSWS_GREEN_LIGHT_LA GSEC_GL_CSWS
8 201401132252 201401132300 CSWS_LA 19.01 CSWS_LA AEPM_CSWS
9 201401132252 201401132300 CSWS_LA 19.01 CSWS_LA AEP_LOSS
10 201401132252 201401132300 CSWS_OMPA_LA 18.66 CSWS_OMPA_LA OMPA_CSWS
11 201401132252 201401132300 CSWS_TENASKA_LA 18.95 CSWS_TENASKA_LA GATEWAY_LOAD
12 201401132252 201401132300 CSWS112_WGORLD1 18.7 CSWS_LA AEPM_CSWS
13 201401132252 201401132300 CSWS112_WGORLD1 18.7 CSWS_LA AEP_LOSS
14 201401132252 201401132300 CSWS116PEORILD1 18.9 CSWS_LA AEPM_CSWS
15 201401132252 201401132300 CSWS116PEORILD1 18.9 CSWS_LA AEP_LOSS
16 201401132252 201401132300 CSWS121EASTLDXFL1 18.92 CSWS_LA AEPM_CSWS
17 201401132252 201401132300 CSWS121EASTLDXFL1 18.92 CSWS_LA AEP_LOSS
18 201401132252 201401132300 CSWS121LYNN4LD1 18.91 CSWS_LA AEPM_CSWS
19 201401132252 201401132300 CSWS121LYNN4LD1 18.91 CSWS_LA AEP_LOSS
20 201401132252 201401132300 CSWS12TH_STLD69_12 18.92 CSWS_LA AEPM_CSWS
21 201401132252 201401132300 CSWS12TH_STLD69_12 18.92 CSWS_LA AEP_LOSS
22 201401132252 201401132300 CSWS12TH_STLD69_12_2 18.92 CSWS_LA AEPM_CSWS
23 201401132252 201401132300 CSWS12TH_STLD69_12_2 18.92 CSWS_LA AEP_LOSS
24 201401132252 201401132300 CSWS136_YALELD1 18.9 CSWS_LA AEPM_CSWS
25 201401132252 201401132300 CSWS136_YALELD1 18.9 CSWS_LA AEP_LOSS
26 201401132252 201401132300 CSWS141_PINELDXFMR1 19.09 CSWS_LA AEPM_CSWS
27 < > <NA> <NA> <NA> <NA> <NA>
# Get the first page of results
tableElem <- remDr$findElement(using = "id", "dgLIP")
tmp <- readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
hourlyData <- list()
# Save the first table without the last row, which is gibberish
hourlyData[[1]] <- tmp[[1]][-27,]
# Click the 'greater than' arrow javascript href element to get to next page
acc <- 2
while("javascript:__doPostBack('dgLIP$_ctl29$_ctl1','')" %in% unlist(lapply(remDr$findElements("css selector", "[href]"), function(x){x$getElementAttribute("href")}))) {
webElems <- remDr$findElements("css selector", "[href]")
clickers <- unlist(lapply(webElems, function(x){x$getElementAttribute("href")}))
pager <- webElems[[which(clickers == "javascript:__doPostBack('dgLIP$_ctl29$_ctl1','')")]]
pager$clickElement()
tableElem <- remDr$findElement(using = "id", "dgLIP")
tmp <- readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
hourlyData[[acc]] <- tmp[[1]]
acc <- acc + 1
Sys.sleep(3)
}