将主题映射回R中的原始数据帧

将主题映射回R中的原始数据帧,r,R,我已将excel中的数据读取到R中,数据由459行和3列组成 library(openxlsx) datamg <- read.xlsx("GC1.xlsx",sheet=1,startRow = 1,colNames = TRUE,skipEmptyRows = TRUE) head(datamg,3) Q Themes1 Themes2 1 yes I believe i

我已将excel中的数据读取到R中,数据由459行和3列组成

library(openxlsx)
datamg <- read.xlsx("GC1.xlsx",sheet=1,startRow = 1,colNames = 
TRUE,skipEmptyRows = TRUE)
head(datamg,3)

                  Q                                   Themes1     Themes2
1 yes I believe it . Because the risk limits       Nature of risk    <NA>
2 Yes but a very low risk                                   Other    <NA>
3 worried about potential regulations         Regulatory concerns    <NA>
库(openxlsx)

datamg为了将它们映射回原始数据集,您必须向语料库和文档术语矩阵中的每个文档添加唯一标识符。由于您没有行id(或某种唯一键),因此我基于行号创建一个行id,并将其添加到原始数据集中:

library(dplyr)
library(tm)
library(topicmodels)
library(tidytext)

datamg$doc_id <- 1:nrow(datamg)

datamg <- datamg %>% 
  select(doc_id, Q) %>%
  rename('text' = Q)
库(dplyr)
图书馆(tm)
库(topicmodels)
图书馆(tidytext)
datamg$doc\u id%
重命名('text'=Q)
我只保留这两列,并给它们命名为“doc_id”和“text”,因为将id附加到语料库时,tm包(DataframeSource函数)需要它

myCorpus1 <- Corpus(DataframeSource(datamg))


myCorpus1在
datamg$TopicMapped
中应该是什么?数据框中有3行,输出中有2行。您没有显示第三行输出吗?@KenS。谢谢你的回复。datamg$TopicMapped意味着将主题1、主题2等分配给相应的行。请忽略主题1和主题2,因为根据我对第1(Q)列内容的理解,我试图手动分配主题。@CPak感谢您的回复。根据我之前的评论,请忽略主题1和主题2。datamd$TopicMapped应该最适合topicmodeling中确定的主题。
         Q                                   Themes1     Themes2       Topic Mapped
    1 yes I believe it . Because the risk limits       Nature of risk    <NA>  
    2 Yes but a very low risk                                   Other    <NA>
    3 worried about potential regulations         Regulatory concerns    <NA>
library(dplyr)
library(tm)
library(topicmodels)
library(tidytext)

datamg$doc_id <- 1:nrow(datamg)

datamg <- datamg %>% 
  select(doc_id, Q) %>%
  rename('text' = Q)
myCorpus1 <- Corpus(DataframeSource(datamg))

document_topic <- as.data.frame(tidy(lda, matrix = "gamma"))
document_topic$document <- as.integer(document_topic$document)

document_topic <- document_topic %>%
  group_by(document) %>%
  top_n(1) %>%
  ungroup()
df_join <- inner_join(datamg, document_topic, by = c("Q" = "document"))