如何在R中读取多个HTML表
我正在尝试自动拉入并保存到这个readHTML函数的数据帧中;我是一个R新手,我很难想出如何编写一个循环来自动化这个函数,如果你一个接一个地去做的话如何在R中读取多个HTML表,r,R,我正在尝试自动拉入并保存到这个readHTML函数的数据帧中;我是一个R新手,我很难想出如何编写一个循环来自动化这个函数,如果你一个接一个地去做的话 library('XML') urls<-c("http://www.basketball-reference.com/teams/ATL/","http://www.basketball-reference.com/teams/BOS/") theurl<-urls[2] #Pick second link (celtics) t
library('XML')
urls<-c("http://www.basketball-reference.com/teams/ATL/","http://www.basketball-reference.com/teams/BOS/")
theurl<-urls[2] #Pick second link (celtics)
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
BOS <-tables[[which.max(n.rows)]]
Team.History<-write.csv(BOS,"Bos.csv")
library('XML')
URL我假设您希望在URL向量上循环?我想试试这样的东西:
library('XML')
url_base <- "http://www.basketball-reference.com/teams/"
teams <- c("ATL", "BOS")
# better still, get the full list of teams as in
# http://stackoverflow.com/a/11804014/1543437
results <- data.frame()
for(team in teams){
theurl <- paste(url_base, team , sep="/")
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
team.results <-tables[[which.max(n.rows)]]
write.csv(team.results, file=paste0(team, ".csv"))
team.results$TeamCode <- team
results <- rbind(results, team.results)
}
write.csv(results, file="AllTeams.csv")
library('XML')
url_base我认为这结合了两种答案的优点(并稍微整理了一下)
库(RCurl)
库(XML)
stem您似乎已经学会了如何使用lappy
。你考虑过在你的URL
向量上使用那些lapply
技巧吗?请注意,没有必要将'write.csv'的结果分配给一个变量。你可能还想将所有结果捆绑到一个文件中(请参阅上面答案中的更新)。Sean这是非常熟练的。非常感谢。我希望你自己用它来享受运动的乐趣。请与任何其他与运动相关的刮擦用品保持联系。在twitter@abreslerI上给我打电话我一直在考虑一些奥运会的数据处理……但还没有找到时间。这么长时间了,也许吧?
library(RCurl)
library(XML)
stem <- "http://www.basketball-reference.com/teams/"
teams <- htmlParse(getURL(stem), asText=T)
teams <- xpathSApply(teams,"//*/a[contains(@href,'/teams/')]", xmlAttrs)[-1]
teams <- gsub("/teams/(.*)/", "\\1", teams)
urls <- paste0(stem, teams)
names(teams) <- NULL # get rid of the "href" labels
names(urls) <- teams
results <- data.frame()
for(team in teams){
tables <- readHTMLTable(urls[team])
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
team.results <- tables[[which.max(n.rows)]]
write.csv(team.results, file=paste0(team, ".csv"))
team.results$TeamCode <- team
results <- rbind(results, team.results)
rm(team.results, n.rows, tables)
}
rm(stem, team)
write.csv(results, file="AllTeams.csv")