使用R从带有JavaScript按钮的ASP.NET网页中删除表

使用R从带有JavaScript按钮的ASP.NET网页中删除表,javascript,asp.net,r,http,rcurl,Javascript,Asp.net,R,Http,Rcurl,我滔滔不绝地问了很多相关的问题,但都无济于事。我需要根据我指定的日期和时间从ASP.NET网页()中获取价格信息表。我很习惯并希望使用R。我的基本障碍是URL不反映搜索参数,它是静态的,而且我也不知道如何在ASP.NET网站上提交包含Javascript的HTML表单 我查看了上面URL的源代码。我发现在一个iframe中有一个指向另一个页面的“源数据”链接:。我尝试基于此StackOverflow线程在R中执行POST请求: 此外,我只能获取从服务器返回的页面的HTML,该页面不包含所有结果。

我滔滔不绝地问了很多相关的问题,但都无济于事。我需要根据我指定的日期和时间从ASP.NET网页()中获取价格信息表。我很习惯并希望使用R。我的基本障碍是URL不反映搜索参数,它是静态的,而且我也不知道如何在ASP.NET网站上提交包含Javascript的HTML表单

我查看了上面URL的源代码。我发现在一个iframe中有一个指向另一个页面的“源数据”链接:。我尝试基于此StackOverflow线程在R中执行POST请求:

此外,我只能获取从服务器返回的页面的HTML,该页面不包含所有结果。事实上,页面底部有JavaScript箭头按钮,允许我在网页中的所有结果之间进行制表

在网页本身中,要在从下拉菜单中选择后查看结果,我必须点击“查看”按钮。有没有一种方法可以在R中复制此参数,以将我的'03'参数作为查询发送到服务器,从而将新的HTML返回到网页


如果我能做到这一点,我也可以写一些东西来“推”页面箭头

你可以用硒来做这个。看见免责声明我是RSelenium软件包的作者。有关操作的基本示意图可在和上查看

require(RSelenium)
#RSelenium::startServer()#如果需要

remDr对于子孙后代,我还想在结果页面之间的页面点击中显示我正在使用的代码(没有“全部显示”选项)。我让RSelenium点击所有页面,直到不再有“向前点击”选项。在每个页面上,它将HTML表刮入一个列表:

# Get the first page of results
tableElem <- remDr$findElement(using = "id", "dgLIP")
tmp <- readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
hourlyData <- list()
# Save the first table without the last row, which is gibberish
hourlyData[[1]] <- tmp[[1]][-27,]

# Click the 'greater than' arrow javascript href element to get to next page  
acc <- 2
while("javascript:__doPostBack('dgLIP$_ctl29$_ctl1','')" %in% unlist(lapply(remDr$findElements("css selector", "[href]"), function(x){x$getElementAttribute("href")}))) {
  webElems <- remDr$findElements("css selector", "[href]")
  clickers <- unlist(lapply(webElems, function(x){x$getElementAttribute("href")}))
  pager <- webElems[[which(clickers == "javascript:__doPostBack('dgLIP$_ctl29$_ctl1','')")]]
  pager$clickElement()
  tableElem <- remDr$findElement(using = "id", "dgLIP")
  tmp <- readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
  hourlyData[[acc]] <- tmp[[1]]
  acc <- acc + 1
  Sys.sleep(3)
}
#获取结果的第一页

我希望其他人会给你更多乐观的理由,但我的建议是不要这样做。将python与selenium驱动程序一起使用,即使您事先不了解python,也会更容易。我这样说是因为我喜欢R,并试图用它来做任何事情,但在这种情况下,我认为它不是适合这份工作的工具。谢谢你,伊斯塔……我在进入这个小泡菜之前从未听说过硒。您认为使用Python驱动程序比下面jdharrison建议的R包更有优势吗?好的,我很感兴趣!星期一我要试一试我被困在'remDr$open()'获取错误'error in function(type,msg,asError=TRUE):无法连接到主机'。我使用devtools软件包安装在R中,并从GitHub下载。@sclarky您需要运行selenium服务器。请参阅RSelenium basics vignette。很抱歉@jdharison这对我来说是一门外语。我运行了checkserver和startserver函数,当我导航到RSelenium包中的bin文件夹时,selenium jar文件就在那里。但是我仍然会遇到错误,这可能是java的问题。您是否在该系统上安装了Java或Java+JDK?你可以检查如下1。转到命令提示符2。键入“java-version”
           <td style="height: 42px; width: 77px;">
<span id="lblLIPHour">Hour</span><br><select name="ddlHour" id="ddlHour"><option value="1">01</option>
<option value="2">02</option>
<option selected value="3">03</option>
<option value="4">04</option>
<option value="5">05</option>
<option value="6">06</option>
<option value="7">07</option>
<option value="8">08</option>
> readHTMLTable(tableData[[1]])
   Publish Date   Price Date                PNode Price        Parent PNode Settlement Location
1  201402281552 201402281600                 AECI 23.45                AECI                AECI
2  201402281552 201402281600                 AMRN 23.45                AMRN                AMRN
3  201402281552 201402281600                 BLKW 23.45                BLKW                BLKW
4  201402281552 201402281600                 CLEC 23.45                CLEC                CLEC
5  201402281552 201402281600         CSWS_AECC_LA 23.45        CSWS_AECC_LA           AECC_CSWS
require(RSelenium)
# RSelenium::startServer() # if needed
remDr <- remoteDriver()
remDr$open()
remDr$setImplicitWaitTimeout(3000)
remDr$navigate("http://www.spp.org/LIP.asp")
remDr$switchToFrame("content_frame")
dateElem <- remDr$findElement(using = "id", "txtLIPDate") # select the date
dateRequired <- "01/14/2014"
dateElem$clearElement()
dateElem$sendKeysToElement(list("01/14/2014", key = "enter")) # send a date to app
hourElem <- remDr$findElement(using = "css selector", '#ddlHour [value="5"]') # select the 5th hour
hourElem$clickElement() # select this hour
buttonElem <-remDr$findElement(using = "id", "cmdView")
buttonElem$clickElement() # click the view button

#Sys.sleep(5)
tableElem <- remDr$findElement(using = "id", "dgLIP")
readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))

[1] "tableElem$getElementAttribute(\"outerHTML\")"
$dgLIP
V1           V2                   V3    V4                  V5                  V6
1  Publish Date   Price Date                PNode Price        Parent PNode Settlement Location
2  201401132252 201401132300                 AECI 19.14                AECI                AECI
3  201401132252 201401132300                 AMRN 18.87                AMRN                AMRN
4  201401132252 201401132300                 BLKW 20.28                BLKW                BLKW
5  201401132252 201401132300                 CLEC 18.99                CLEC                CLEC
6  201401132252 201401132300         CSWS_AECC_LA 19.77        CSWS_AECC_LA           AECC_CSWS
7  201401132252 201401132300  CSWS_GREEN_LIGHT_LA  18.5 CSWS_GREEN_LIGHT_LA        GSEC_GL_CSWS
8  201401132252 201401132300              CSWS_LA 19.01             CSWS_LA           AEPM_CSWS
9  201401132252 201401132300              CSWS_LA 19.01             CSWS_LA            AEP_LOSS
10 201401132252 201401132300         CSWS_OMPA_LA 18.66        CSWS_OMPA_LA           OMPA_CSWS
11 201401132252 201401132300      CSWS_TENASKA_LA 18.95     CSWS_TENASKA_LA        GATEWAY_LOAD
12 201401132252 201401132300      CSWS112_WGORLD1  18.7             CSWS_LA           AEPM_CSWS
13 201401132252 201401132300      CSWS112_WGORLD1  18.7             CSWS_LA            AEP_LOSS
14 201401132252 201401132300      CSWS116PEORILD1  18.9             CSWS_LA           AEPM_CSWS
15 201401132252 201401132300      CSWS116PEORILD1  18.9             CSWS_LA            AEP_LOSS
16 201401132252 201401132300    CSWS121EASTLDXFL1 18.92             CSWS_LA           AEPM_CSWS
17 201401132252 201401132300    CSWS121EASTLDXFL1 18.92             CSWS_LA            AEP_LOSS
18 201401132252 201401132300      CSWS121LYNN4LD1 18.91             CSWS_LA           AEPM_CSWS
19 201401132252 201401132300      CSWS121LYNN4LD1 18.91             CSWS_LA            AEP_LOSS
20 201401132252 201401132300   CSWS12TH_STLD69_12 18.92             CSWS_LA           AEPM_CSWS
21 201401132252 201401132300   CSWS12TH_STLD69_12 18.92             CSWS_LA            AEP_LOSS
22 201401132252 201401132300 CSWS12TH_STLD69_12_2 18.92             CSWS_LA           AEPM_CSWS
23 201401132252 201401132300 CSWS12TH_STLD69_12_2 18.92             CSWS_LA            AEP_LOSS
24 201401132252 201401132300      CSWS136_YALELD1  18.9             CSWS_LA           AEPM_CSWS
25 201401132252 201401132300      CSWS136_YALELD1  18.9             CSWS_LA            AEP_LOSS
26 201401132252 201401132300  CSWS141_PINELDXFMR1 19.09             CSWS_LA           AEPM_CSWS
27          < >         <NA>                 <NA>  <NA>                <NA>                <NA>
# Get the first page of results
tableElem <- remDr$findElement(using = "id", "dgLIP")
tmp <- readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
hourlyData <- list()
# Save the first table without the last row, which is gibberish
hourlyData[[1]] <- tmp[[1]][-27,]

# Click the 'greater than' arrow javascript href element to get to next page  
acc <- 2
while("javascript:__doPostBack('dgLIP$_ctl29$_ctl1','')" %in% unlist(lapply(remDr$findElements("css selector", "[href]"), function(x){x$getElementAttribute("href")}))) {
  webElems <- remDr$findElements("css selector", "[href]")
  clickers <- unlist(lapply(webElems, function(x){x$getElementAttribute("href")}))
  pager <- webElems[[which(clickers == "javascript:__doPostBack('dgLIP$_ctl29$_ctl1','')")]]
  pager$clickElement()
  tableElem <- remDr$findElement(using = "id", "dgLIP")
  tmp <- readHTMLTable(htmlParse(tableElem$getElementAttribute("outerHTML")[[1]]))
  hourlyData[[acc]] <- tmp[[1]]
  acc <- acc + 1
  Sys.sleep(3)
}