R 如何绘制大词联想的树状图

R 如何绘制大词联想的树状图,r,tm,dendrogram,R,Tm,Dendrogram,我想为文本文件绘制单词关联。问题的一部分似乎是字数和处理时间,我尝试使用lappy来替换嵌套循环来加快处理速度。但是,我不确定更换lappy是否正确。那么denegram可能过于密集而没有用处。问题是:1)如何加速嵌套for循环,2)如何显示denegram library(RXKCD) library(tm) library(wordcloud) library(RColorBrewer) require(gdata) path <- system.file("xkcd", packa

我想为文本文件绘制单词关联。问题的一部分似乎是字数和处理时间,我尝试使用lappy来替换嵌套循环来加快处理速度。但是,我不确定更换lappy是否正确。那么denegram可能过于密集而没有用处。问题是:1)如何加速嵌套for循环,2)如何显示denegram

library(RXKCD)
library(tm)
library(wordcloud)
library(RColorBrewer)
require(gdata)

path <- system.file("xkcd", package = "RXKCD")
datafiles <- list.files(path)
xlsdf <- read.csv(file.path(path, datafiles))


ap.corpus <- Corpus(DataframeSource(data.frame(as.character(xlsdf[,'transcript'])))) 
ap.corpus <- tm_map(ap.corpus, removePunctuation) 
ap.corpus <- tm_map(ap.corpus, tolower) 
ap.corpus <- tm_map(ap.corpus, removeNumbers)
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english"))) 
# additional stopwords can be used as shown below  
#ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, c("ukoer","oer"))) 
ap.corpus <- tm_map(ap.corpus, PlainTextDocument)
ap.tdm <- TermDocumentMatrix(ap.corpus) 
findFreqTerms(ap.tdm, lowfreq=40)
ap.m <- as.matrix(ap.tdm) 
ap.v <- sort(rowSums(ap.m),decreasing=TRUE) 
ap.d <- data.frame(word = names(ap.v),freq=ap.v) 
print(table(ap.d$freq) )
pal2 <- brewer.pal(8,"Dark2") 

# png("wordcloud_packages.png", width=1280,height=800) 
#print(wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=40, 
#          max.words=Inf, random.order=FALSE, rot.per=.05, colors=pal2))
# dev.off()

f <- matrix (0, ncol=nrow(ap.tdm), nrow=nrow(ap.tdm))  
colnames (f) <- rownames(ap.tdm)
rownames (f) <- rownames(ap.tdm)

# This is the nested loop to replace
#for (i in rownames (ap.tdm)) { 
#  ff <- findAssocs (ap.tdm,i,0)
#  for  (j in rownames (ff)) {
#    f[j,i]=ff[j,]
#  }
#}

fcn2 <- function(j,ff) { ff[j]; }
fcn1 <- function(i) {ff<-findAssocs(ap.tdm,i,0); 
                     f[rownames(ff),i]<-lapply(rownames(ff), fcn2, ff);}
lapply(rownames(ap.tdm), fcn1)

fd <- as.dist(f) # calc distance matrix
plot(hclust(fd, method="ward"))  # plot dendrogram

# very simple dendrogram
hc = hclust(dist(f))
plot(hc)
库(RXKCD)
图书馆(tm)
图书馆(wordcloud)
图书馆(RColorBrewer)
要求(gdata)

检查(ap.tdm)返回的路径是什么?创建树状图的代码在哪里?在底部,通过注释“plot dendrogram”,inspect(ap.tdm)返回什么?创建树状图的代码在哪里?在底部,通过注释“plot dendrogram”