Xml R web刮取/爬行_Xml_R_Rcurl_Lubridate

Xml R web刮取/爬行

xml r

Xml R web刮取/爬行,xml,r,rcurl,lubridate,Xml,R,Rcurl,Lubridate,我想爬。出于某种原因，当我提取html时，它会返回到一个不同的html，我可以在使用inspect elemt查看html时找到它。我使用了以下功能： SetDir = "~/NYSE/" setwd(SetDir) CreateDir = paste(SetDir, "RawData/", sep="") if("RawData" %in% dir(SetDir)==FALSE){ dir.create(CreateDir) } url = paste("https:/

我想爬。出于某种原因，当我提取html时，它会返回到一个不同的html，我可以在使用inspect elemt查看html时找到它。我使用了以下功能：

SetDir = "~/NYSE/"

setwd(SetDir)

CreateDir = paste(SetDir, "RawData/", sep="")

if("RawData" %in% dir(SetDir)==FALSE){
  dir.create(CreateDir)
}



    url = paste("https://www.nyse.com/bell/calendar", sep="")
    urlname <- paste(CreateDir, ".html", sep="")
    err <- try(download.file(url,destfile = urlname, quiet=FALSE), silent=TRUE)
    if(class(err)=="try-error"){
      Sys.sleep(5)
      try(download.file(url,destfile = urlname, quiet=FALSE), silent=TRUE)
    }

我甚至尝试过使用包（如RCurl）实现非常简单的函数：

script <- readLines("https://www.nyse.com/bell/calendar")
script <- getURL("https://www.nyse.com/bell/calendar")

脚本你可以读，对吗？是的，看起来像是否定的，但为什么？这不是公开的信息吗？为什么不允许我使用它？我不一定同意任何网站的ToS，但我也是守法的（在国防部/RPG中也是合法中立的）。他们相信自己在保护自己的数字知识产权/资产。
script <- readLines("https://www.nyse.com/bell/calendar")
script <- getURL("https://www.nyse.com/bell/calendar")