如何在R中使用TermDocumentMatrix作为波斯语文本？_R_Persian_Term Document Matrix

如何在R中使用TermDocumentMatrix作为波斯语文本？

如何在R中使用TermDocumentMatrix作为波斯语文本？,r,persian,term-document-matrix,R,Persian,Term Document Matrix,我想查看文档中的术语频率，我的文档包含波斯语文本。我使用R的方式如下： keycorpus <- Corpus(DirSource("E:\\Sample\\farsi texts")) tm.matrix <- TermDocumentMatrix(keycorpus) View(as.matrix(tm.matrix)) 假设您有一个名为1.txt的文本文件然后： Sys.setlocale（locale=“Persian”，category=“LC\u ALL”） setw

我想查看文档中的术语频率，我的文档包含波斯语文本。我使用R的方式如下：

keycorpus <- Corpus(DirSource("E:\\Sample\\farsi texts"))
tm.matrix <- TermDocumentMatrix(keycorpus)
View(as.matrix(tm.matrix))

假设您有一个名为1.txt的文本文件然后：

Sys.setlocale（locale=“Persian”，category=“LC\u ALL”）
setwd（“E:\\Sample\\farsi\u文本”）
文本请添加错误，如果您不介意部分波斯语文本。您的波斯语测试的编码是什么？编码是UTF-8。没有错误，但在本例中，termdocumentmatrix的输出只包含数字和标点符号，而忽略了波斯语术语。
 Sys.setlocale(locale = "Persian",category = "LC_ALL")
 setwd("E:\\Sample\\farsi_texts")
 text<-readLines("1.txt",encoding = "windows-1256")
 keycorpus <- Corpus(VectorSource(text))
 tm.matrix <- TermDocumentMatrix(keycorpus)
 View(as.matrix(tm.matrix))