通过传递术语共现矩阵,使用TextmineR包按主题加载文档

通过传递术语共现矩阵,使用TextmineR包按主题加载文档,r,text-mining,word-embedding,R,Text Mining,Word Embedding,我使用包查找与给定文档列表最相似的文档。我使用以下代码生成tcm而不是dtm tcm <- CreateTcm(doc_vec = text_df$Description, skipgram_window = 20, verbose = FALSE, cpus = 2) tcm11个月大的问题。但无论如何,还是要尝试一下 从技术上讲,带有LDA嵌入的theta给你p(主题|单词),而ph

我使用包查找与给定文档列表最相似的文档。我使用以下代码生成tcm而不是dtm

tcm <- CreateTcm(doc_vec = text_df$Description,
                 skipgram_window = 20,
                 verbose = FALSE,
                 cpus = 2)

tcm11个月大的问题。但无论如何,还是要尝试一下

从技术上讲,带有LDA嵌入的
theta
给你p(主题|单词),而
phi
仍然给你p(主题|单词)。如果我理解正确,您希望在此模型下嵌入整个文档吗?如果是这样的话,你可以这样做

library(textmineR)

# create a tcm
tcm <- CreateTcm(nih_sample$ABSTRACT_TEXT, skipgram_window = 10)

# fit an LDA model
m <- FitLdaModel(dtm = tcm, k = 100, iterations = 100, burnin = 75)

# pull your documents into a dtm
d <- nih_sample_dtm

# get them predicted under the model
# I recommend using the "dot" method for prediction with embeddings as sparsity may
# result in underflow and throw an error using the default "gibbs" method
p <- predict(object = m, newdata = d, method = "dot")
库(textmineR)
#创建一个tcm
中医药
library(textmineR)

# create a tcm
tcm <- CreateTcm(nih_sample$ABSTRACT_TEXT, skipgram_window = 10)

# fit an LDA model
m <- FitLdaModel(dtm = tcm, k = 100, iterations = 100, burnin = 75)

# pull your documents into a dtm
d <- nih_sample_dtm

# get them predicted under the model
# I recommend using the "dot" method for prediction with embeddings as sparsity may
# result in underflow and throw an error using the default "gibbs" method
p <- predict(object = m, newdata = d, method = "dot")