Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
为什么wordcloud中缺少一些西里尔字母?_R - Fatal编程技术网

为什么wordcloud中缺少一些西里尔字母?

为什么wordcloud中缺少一些西里尔字母?,r,R,我有大量的俄语文本。当我构建wordcloud时,我看到一些像“ч”这样的字符没有被渲染。代码如下所示: dat <- read.csv("news.csv",sep=";",header=TRUE,stringsAsFactors=FALSE) corpus <- Corpus(VectorSource(dat$Article), readerControl = list(reader=readPlain,language="ru")) corpus <- tm_map(co

我有大量的俄语文本。当我构建wordcloud时,我看到一些像“ч”这样的字符没有被渲染。代码如下所示:

dat <- read.csv("news.csv",sep=";",header=TRUE,stringsAsFactors=FALSE)
corpus <- Corpus(VectorSource(dat$Article),
readerControl = list(reader=readPlain,language="ru"))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords,
stopwords("russian")))
dtm <- TermDocumentMatrix(corpus)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud.png", width=640,height=640)
wordcloud(d$word,d$freq, scale=c(8,.2), min.freq=5, max.words=200, 
random.order=FALSE, rot.per=0, colors=pal2)
dev.off()
dat[来自OP自己的编辑,但在此处重复以完成问题答案]
您需要添加,以及其他
tm\u map()
调用

语料库
corpus <- tm_map(corpus, iconv, 'cp1251', 'UTF-8')
corpus <- tm_map(corpus, iconv, 'cp1251', 'UTF-8')