错误:preNode[[1]]中出错:R中的下标超出边界
当我尝试重新构建网址时,我遇到了一个错误“preNode[[1]]中的错误:下标超出范围”。下面列出了我的代码,我正在通过樱花赛跑选手的URL获取信息 如果我的重建有问题,请告诉我:/错误:preNode[[1]]中出错:R中的下标超出边界,r,xml,R,Xml,当我尝试重新构建网址时,我遇到了一个错误“preNode[[1]]中的错误:下标超出范围”。下面列出了我的代码,我正在通过樱花赛跑选手的URL获取信息 如果我的重建有问题,请告诉我:/ Installing XML Package install.packages("XML") library(XML) #Establish the View Page Source of the Web Site ubase = "http://www.cherryblossom.org/" url =
Installing XML Package
install.packages("XML")
library(XML)
#Establish the View Page Source of the Web Site
ubase = "http://www.cherryblossom.org/"
url = paste(ubase, "/results/2012/2012cucb10m-m.htm", sep = "")
doc = htmlParse(url)
preNode = getNodeSet(doc, "//pre")
txt = xmlValue(preNode[[1]])
nchar(txt)
substr(txt, 1, 50)
substr(txt, nchar(txt) - 50, nchar(txt))
els = strsplit(txt, "\\r\\n")[[1]]
length(els)
els[1:3]
els[ length(els) ]
extractResTable =
# Retrieve data from web site, find preformatted text,
# return as a character vector.
function(url)
{
doc = htmlParse(url)
preNode = getNodeSet(doc, "//pre")
txt = xmlValue(preNode[[1]])
els = strsplit(txt, "\r\n")[[1]]
return(els)
}
# Retrieve the 2012 Men's Results
m2012 = extractResTable(url)
identical(m2012, els)
#Setting a vector of all the URLS for each year
ubase = "http://www.cherryblossom.org/"
urls = paste(ubase, "results/", 1999:2012, "/",
1999:2012, "cucb10m-m.htm", sep = "")
# Apply the extractRestTable() to "urls"
menTables = lapply(urls, extractResTable)
#Resolving the error message
options(error = recover)
menTables = lapply(urls, extractResTable) #Choose Selection 2
# After choosing Selection 2, enter ls() in the console below
# The list should display: [1] "doc" "preNode" "url"
# Proceed if so by:
# 1. Enter url in the console.
# Output: [1] "http://www.cherryblossom.org/results/1999/1999cucb10m-m.htm"
# 2. Enter length(preNode)
# Output: [1] 0
# Gather the URLs for Male Results into a character vector, menURLS
menURLs =
c("cb99m.htm", "cb003m.htm", "results/2001/oof_m.html",
"results/2002/oofm.htm", "results/2003/CB03-M.HTM",
"results/2004/men.htm", "results/2005/CB05-M.htm",
"results/2006/men.htm", "results/2007/men.htm",
"results/2008/men.htm", "results/2009/09cucb-M.htm",
"results/2010/2010cucb10m-m.htm",
"results/2011/2011cucb10m-m.htm",
"results/2012/2012cucb10m-m.htm")
# Reconstruct the urls vector to contain the proper Web Addresses
urls = paste(ubase, menURLs, sep = "")
urls[1:3]
# Print out the results again
menTables = lapply(urls, extractResTable)
names(menTables) = 1999:2012
length(urls)
两个URL向量都不正确。一个在每个url的最后一个位置都有相同的元素,另一个缺少前两个地址的
results/
部分。这就是导致你出错的原因,但除此之外,我不确定你的问题是什么。埃丝特,你指的是URL的最初分配吗?ubase=“”url=paste(ubase,“results/2012/2012cucb10m-m.htm”,sep=“”)doc=htmlpasse(url)我做了更改,删除了第一个url中的/。还是有同样的错误。哇,谢谢!我发现了错误。我在遵循一本教科书中的示例代码,它给了我错误的URL文件路径--。谢谢你,以斯帖!