R 基于tm语料库函数的lapply行为

R 基于tm语料库函数的lapply行为,r,lapply,tm,R,Lapply,Tm,我有一个数据帧,我想在上面使用lappy。我在这里选择了第一列的第一个值: link <- c( "http://www.r-statistics.com/tag/hadley-wickham/", "http://had.co.nz/",

我有一个数据帧,我想在上面使用lappy。我在这里选择了第一列的第一个值:

link <- c(
    "http://www.r-statistics.com/tag/hadley-wickham/",                                                      
    "http://had.co.nz/",                                                                                    
    "http://vita.had.co.nz/articles.html",                                                                  
    "http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html",                          
    "http://www.analyticstory.com/hadley-wickham/"  
)               

为什么此函数在lappy中不起作用?

您的函数中有一个问题。用
url.name
替换
link
的所有实例,它就会工作

# library(XML); library(tm)

create.corpus <- function(url.name){
  doc=htmlParse(url.name)
  parag=xpathSApply(doc,'//p',xmlValue)
  cc=Corpus(VectorSource(parag))
  meta(cc,type='corpus','link') <- url.name
  return(cc)
}

cc <- lapply(link, create.corpus)
cc=lapply(link,create.corpus) # does not work
cc=lapply(link,nchar) # works

link=link[1] # try on single element
cc=create.corpus(link) # works
# library(XML); library(tm)

create.corpus <- function(url.name){
  doc=htmlParse(url.name)
  parag=xpathSApply(doc,'//p',xmlValue)
  cc=Corpus(VectorSource(parag))
  meta(cc,type='corpus','link') <- url.name
  return(cc)
}

cc <- lapply(link, create.corpus)
> cc
[[1]]
A corpus with 48 text documents

[[2]]
A corpus with 2 text documents

[[3]]
A corpus with 41 text documents

[[4]]
A corpus with 25 text documents

[[5]]
A corpus with 39 text documents