R 基于tm语料库函数的lapply行为
我有一个数据帧,我想在上面使用lappy。我在这里选择了第一列的第一个值:R 基于tm语料库函数的lapply行为,r,lapply,tm,R,Lapply,Tm,我有一个数据帧,我想在上面使用lappy。我在这里选择了第一列的第一个值: link <- c( "http://www.r-statistics.com/tag/hadley-wickham/", "http://had.co.nz/",
link <- c(
"http://www.r-statistics.com/tag/hadley-wickham/",
"http://had.co.nz/",
"http://vita.had.co.nz/articles.html",
"http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html",
"http://www.analyticstory.com/hadley-wickham/"
)
为什么此函数在lappy中不起作用?您的函数中有一个问题。用
url.name
替换link
的所有实例,它就会工作
# library(XML); library(tm)
create.corpus <- function(url.name){
doc=htmlParse(url.name)
parag=xpathSApply(doc,'//p',xmlValue)
cc=Corpus(VectorSource(parag))
meta(cc,type='corpus','link') <- url.name
return(cc)
}
cc <- lapply(link, create.corpus)
cc=lapply(link,create.corpus) # does not work
cc=lapply(link,nchar) # works
link=link[1] # try on single element
cc=create.corpus(link) # works
# library(XML); library(tm)
create.corpus <- function(url.name){
doc=htmlParse(url.name)
parag=xpathSApply(doc,'//p',xmlValue)
cc=Corpus(VectorSource(parag))
meta(cc,type='corpus','link') <- url.name
return(cc)
}
cc <- lapply(link, create.corpus)
> cc
[[1]]
A corpus with 48 text documents
[[2]]
A corpus with 2 text documents
[[3]]
A corpus with 41 text documents
[[4]]
A corpus with 25 text documents
[[5]]
A corpus with 39 text documents