Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
KNN用于文本分类,但train和class在R中的长度不同_R_Text Mining_Knn_Tf Idf - Fatal编程技术网

KNN用于文本分类,但train和class在R中的长度不同

KNN用于文本分类,但train和class在R中的长度不同,r,text-mining,knn,tf-idf,R,Text Mining,Knn,Tf Idf,你好,我正在尝试对文本进行分类,下面是代码 df <- read.csv("D:/AS/tokpedprepro.csv") #sampling set.seed(123) df <- df[sample(nrow(df)),] df <- df[sample(nrow(df)),] #Convert to corpus dfCorpus <- Corpus(VectorSource(df$text)) inspect(dfCorpus[1:20]) #conver

你好,我正在尝试对文本进行分类,下面是代码

df <- read.csv("D:/AS/tokpedprepro.csv")

#sampling
set.seed(123)
df <- df[sample(nrow(df)),]
df <- df[sample(nrow(df)),]

#Convert to corpus
dfCorpus <- Corpus(VectorSource(df$text))
inspect(dfCorpus[1:20])

#convert DTM
dtm <- DocumentTermMatrix(dfCorpus)
inspect(dtm[1:4, 3:7])

#Data Partition
df.train <- df[1:20,]
df.test <- df[21:37,]

dtm.train <- dtm[1:20,]
dtm.test <- dtm[21:37,]

df.Corpus.train <- dfCorpus[1:20]
df.corpus.test <- dfCorpus[21:37]

train.class <- df$data.class

#TFIDF
dtm.train.knn <- DocumentTermMatrix(df.Corpus.train, control = list(weighting = 
function(x) weightTfIdf(x, normalize = FALSE)))
dim(dtm.train.knn)
然后


knn.pred您的
train.class
train.class请创建一个。
[1]  20 194

dtm.test.knn <- DocumentTermMatrix(df.corpus.test, control = list(weighting = 
function(x) weightTfIdf(x, normalize = FALSE)))
dim(dtm.test.knn)
[1]  17 211
knn.pred <- knn(dtm.train.knn, dtm.test.knn, train.class, k=1 )