Stopwords可以';t删除R中的单词
我尝试使用包含停止词的文本删除单词。但这种情况正在发生Stopwords可以';t删除R中的单词,r,tm,R,Tm,我尝试使用包含停止词的文本删除单词。但这种情况正在发生 library(corpus) library(tm) tokpedClean <- read.csv("D:/AS/tokpedClean5.csv") head(tokpedClean) tokpedCleanCor = Corpus(VectorSource(tokpedClean$text)) removeURL <- function(x) gsub("http[^[:space:]]*", "", x) docsC
library(corpus)
library(tm)
tokpedClean <- read.csv("D:/AS/tokpedClean5.csv")
head(tokpedClean)
tokpedCleanCor = Corpus(VectorSource(tokpedClean$text))
removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
docsClean <- tm_map(docs, removeURL)
inspect(docsClean[1:5])
下一步是停止词
cStopwordID <- readLines("D:/AS/swID.csv")
stop <- tm_map(docsClean, removeWords, cStopwordID)
cStopwordID我认为您应该首先转换为DocumentTermMatrix()
您的停止字需要是一个向量,而不是一个data.frame或一个列表。
cStopwordID <- readLines("D:/AS/swID.csv")
stop <- tm_map(docsClean, removeWords, cStopwordID)