R 如何绘制大词联想的树状图
我想为文本文件绘制单词关联。问题的一部分似乎是字数和处理时间,我尝试使用lappy来替换嵌套循环来加快处理速度。但是,我不确定更换lappy是否正确。那么denegram可能过于密集而没有用处。问题是:1)如何加速嵌套for循环,2)如何显示denegramR 如何绘制大词联想的树状图,r,tm,dendrogram,R,Tm,Dendrogram,我想为文本文件绘制单词关联。问题的一部分似乎是字数和处理时间,我尝试使用lappy来替换嵌套循环来加快处理速度。但是,我不确定更换lappy是否正确。那么denegram可能过于密集而没有用处。问题是:1)如何加速嵌套for循环,2)如何显示denegram library(RXKCD) library(tm) library(wordcloud) library(RColorBrewer) require(gdata) path <- system.file("xkcd", packa
library(RXKCD)
library(tm)
library(wordcloud)
library(RColorBrewer)
require(gdata)
path <- system.file("xkcd", package = "RXKCD")
datafiles <- list.files(path)
xlsdf <- read.csv(file.path(path, datafiles))
ap.corpus <- Corpus(DataframeSource(data.frame(as.character(xlsdf[,'transcript']))))
ap.corpus <- tm_map(ap.corpus, removePunctuation)
ap.corpus <- tm_map(ap.corpus, tolower)
ap.corpus <- tm_map(ap.corpus, removeNumbers)
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english")))
# additional stopwords can be used as shown below
#ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, c("ukoer","oer")))
ap.corpus <- tm_map(ap.corpus, PlainTextDocument)
ap.tdm <- TermDocumentMatrix(ap.corpus)
findFreqTerms(ap.tdm, lowfreq=40)
ap.m <- as.matrix(ap.tdm)
ap.v <- sort(rowSums(ap.m),decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
print(table(ap.d$freq) )
pal2 <- brewer.pal(8,"Dark2")
# png("wordcloud_packages.png", width=1280,height=800)
#print(wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=40,
# max.words=Inf, random.order=FALSE, rot.per=.05, colors=pal2))
# dev.off()
f <- matrix (0, ncol=nrow(ap.tdm), nrow=nrow(ap.tdm))
colnames (f) <- rownames(ap.tdm)
rownames (f) <- rownames(ap.tdm)
# This is the nested loop to replace
#for (i in rownames (ap.tdm)) {
# ff <- findAssocs (ap.tdm,i,0)
# for (j in rownames (ff)) {
# f[j,i]=ff[j,]
# }
#}
fcn2 <- function(j,ff) { ff[j]; }
fcn1 <- function(i) {ff<-findAssocs(ap.tdm,i,0);
f[rownames(ff),i]<-lapply(rownames(ff), fcn2, ff);}
lapply(rownames(ap.tdm), fcn1)
fd <- as.dist(f) # calc distance matrix
plot(hclust(fd, method="ward")) # plot dendrogram
# very simple dendrogram
hc = hclust(dist(f))
plot(hc)
库(RXKCD)
图书馆(tm)
图书馆(wordcloud)
图书馆(RColorBrewer)
要求(gdata)
检查(ap.tdm)返回的路径是什么?创建树状图的代码在哪里?在底部,通过注释“plot dendrogram”,inspect(ap.tdm)返回什么?创建树状图的代码在哪里?在底部,通过注释“plot dendrogram”