Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Twitter上的文本挖掘_R_Tm - Fatal编程技术网

Twitter上的文本挖掘

Twitter上的文本挖掘,r,tm,R,Tm,我正试图按照教程在twitter上进行文本挖掘 我的代码是: library(twitteR) library(NLP) library(tm) library(wordcloud) library(RColorBrewer) mh370 <- searchTwitter("#PrayForMH370", since = "2014-03-08", until = "2014-03-20", n = 1000) mh370_text = sapply(mh370, functio

我正试图按照教程在twitter上进行文本挖掘 我的代码是:

library(twitteR)
library(NLP)
library(tm)
library(wordcloud)
library(RColorBrewer)

mh370 <- searchTwitter("#PrayForMH370", since = "2014-03-08", until =     "2014-03-20", n = 1000)
mh370_text = sapply(mh370, function(x) x$getText())
mh370_corpus = Corpus(VectorSource(mh370_text))

tdm = TermDocumentMatrix(mh370_corpus,control = list(removePunctuation =     TRUE,stopwords = c("prayformh370", "prayformh",    stopwords("english")),removeNumbers = TRUE, tolower = TRUE))
m = as.matrix(tdm)
# get word counts in decreasing order
word_freqs = sort(rowSums(m), decreasing = TRUE) 
# create a data frame with words and their frequencies
dm = data.frame(word = names(word_freqs), freq = word_freqs)
wordcloud(dm$word,dm$freq,random.order=FALSE,colors=brewer.pal(8,"Dark2"))
请给出建议。

如前所述,也许您应该通过在wordcloud中添加
max.words
来减少绘图中的字数

wordcloud(dm$word, dm$freq, scale=c(8,3), min.freq=2, max.words=120,
          random.order=FALSE, colors=brewer.pal(8,"Dark2"))

我还建议使用
min.freq
绘制至少出现两次的单词,并使用
scale
控制单词的大小。调整这些,直到得到一个好的绘图。

您可能还想
删除ParseTerms
。我遇到了一个类似的问题,不久前我发现了这个问题。虽然我不得不修改解决方案,但删除稀疏项还是有效的
tm
package具有此功能。

检查数据帧dm中的最大和最小频率,并尝试先创建字数较少的wordcloud。删除空白,然后重试。
wordcloud(dm$word, dm$freq, scale=c(8,3), min.freq=2, max.words=120,
          random.order=FALSE, colors=brewer.pal(8,"Dark2"))