Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/asp.net/31.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用R从html/aspx网页登录并提取表_Html_Asp.net_R - Fatal编程技术网

如何使用R从html/aspx网页登录并提取表

如何使用R从html/aspx网页登录并提取表,html,asp.net,r,Html,Asp.net,R,我试图从中提取一个表 网页是中文的,但基本上,你可以在网页中间的大蓝色按钮上方的框中键入你的日志。登录后,表将出现在页面的中间。注意:在/articlenew.html中,登录只需要用户名和密码。没有别的了 认证后,网站的标题如下所示: Request URL:http://www.sxcoal.com/user/login.aspx Request Method:POST Status Code:302 Found Request Headersview source Accept:text/

我试图从中提取一个表

<>网页是中文的,但基本上,你可以在网页中间的大蓝色按钮上方的框中键入你的日志。登录后,表将出现在页面的中间。注意:在/articlenew.html中,登录只需要用户名和密码。没有别的了

认证后,网站的标题如下所示:

Request URL:http://www.sxcoal.com/user/login.aspx
Request Method:POST
Status Code:302 Found
Request Headersview source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en,en-GB;q=0.8,zh;q=0.6,zh-CN;q=0.4
Connection:keep-alive
Content-Length:39
Content-Type:application/x-www-form-urlencoded
Cookie:the_cookies
Host:www.sxcoal.com
Origin:http://www.sxcoal.com
Referer:http://www.sxcoal.com/coal/3478186/articlenew.html
User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36
Form Dataview sourceview URL encoded
username:myusername
password:mypassword
Response Headersview source
Cache-Control:private
Content-Length:167
Content-Type:text/html; charset=gb2312
Date:Thu, 14 Nov 2013 01:06:00 GMT
Location:http://www.sxcoal.com/coal/3478186/articlenew.html
Server:Microsoft-IIS/7.0
Set-Cookie:s_info=zhuhaiqinfa|15816; domain=sxcoal.com; path=/
X-AspNet-Version:2.0.50727
X-Powered-By:ASP.NET
我已尝试使用所示的方法。但是,由于某些原因,R无法登录。我猜是/login.aspx http:[DELETE]//www.[DELETE]sxcoal.[DELETE]com/user/login.[DELETE]aspx [很抱歉,我没有足够的“声誉”来发布更多链接。]。我把/login.aspx的标题放在问题的末尾

这是我使用的代码

library(RCurl)
mycurl <- getCurlHandle()
agent <- "Mozilla/5.0"
curlSetOpt(cookiejar = "", followlocation = TRUE, useragent = agent, autoreferer = TRUE, curl = mycurl)
html <- getURL('http://www.sxcoal.com/user/login.aspx', curl = mycurl) 
viewstate <- as.character(sub('.*id="__VIEWSTATE" value="([0-9a-zA-Z+/=]*).*', '\\1', html))  
eventvalidation <- as.character(sub('.*id="__EVENTVALIDATION" value="([0-9a-zA-Z+/=]*).*', '\\1', html))  
##checkcode <- ??????????????? ## can't define it as it changes
params <- list( 
        "txtuser"     = "myusername", 
        "txtpass"     = "mypassword",
        "__VIEWSTATE" = viewstate,
        "__EVENTVALIDATION" = eventvalidation,
        "CheckCode"   =  checkcode, 
        "Button2"     =  ""
        )
html <- postForm('http://www.sxcoal.com/user/login.aspx', .params = params, curl = mycurl)
Request URL:http://www.sxcoal.com/user/login.aspx
Request Method:POST
Status Code:302 Found
Request Headersview source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en,en-GB;q=0.8,zh;q=0.6,zh-CN;q=0.4
Connection:keep-alive
Content-Length:234
Content-Type:application/x-www-form-urlencoded
Cookie:the_cookies
Host:www.sxcoal.com
Origin:http://www.sxcoal.com
Referer:http://www.sxcoal.com/user/login.aspx
User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)                 Chrome/30.0.1599.101 Safari/537.36
Form Dataview sourceview URL encoded
__VIEWSTATE:whatever_it_is
txtuser:myusername
txtpass:mypassword
CheckCode:04854
Button2:
__EVENTVALIDATION:whatever_it_it_2
Response Headersview source
Cache-Control:private
Content-Length:170
Content-Type:text/html; charset=gb2312
Date:Thu, 14 Nov 2013 01:09:57 GMT
Location:http://www.sxcoal.com/?aspxerrorpath=/user/login.aspx
Server:Microsoft-IIS/7.0
X-AspNet-Version:2.0.50727
X-Powered-By:ASP.NET