R tm语料库输出若干结构虚词

R tm语料库输出若干结构虚词,r,tm,corpus,R,Tm,Corpus,使用TM库,语料库包括来自矢量源结构的单词: text <- readLines("some.txt") finalCorpus <- Corpus(VectorSource(newCorpus)) finalCorpus <- tm_map(finalCorpus, stripWhitespace) save(finalCorpus, file="data/DEBUG.Rda")# DEBUG df<- data.frame(lapply(finalCorpus, a

使用TM库,语料库包括来自矢量源结构的单词:

text <- readLines("some.txt")

finalCorpus <- Corpus(VectorSource(newCorpus))
finalCorpus <- tm_map(finalCorpus, stripWhitespace)
save(finalCorpus, file="data/DEBUG.Rda")# DEBUG
df<- data.frame(lapply(finalCorpus, as.character), stringsAsFactors=FALSE)
df
>protracted periods meditation fasting prayer ennui fever energy vigor
>married joseph lee dollars million canadian dollars gbp pastored african
>american church snow hill jersey children died infancy **meta list author
>character datetimestamp list sec min hour mday mon year wday yday isdst
>description character heading character id language en origin character
>X2   X3
>1 list list**
文本字符日期时间戳列表秒最小小时mday周一年wday yday isdst
>描述字符标题字符id语言源字符
>X2×3
>1名单**
**之间的单词来自语料库,而不是导入的文本,为什么我会得到它们?如何删除它们(没有removeWords TM函数)