将主题映射回R中的原始数据帧_R

将主题映射回R中的原始数据帧

将主题映射回R中的原始数据帧,r,R,我已将excel中的数据读取到R中，数据由459行和3列组成 library(openxlsx) datamg <- read.xlsx("GC1.xlsx",sheet=1,startRow = 1,colNames = TRUE,skipEmptyRows = TRUE) head(datamg,3) Q Themes1 Themes2 1 yes I believe i

我已将excel中的数据读取到R中，数据由459行和3列组成

library(openxlsx)
datamg <- read.xlsx("GC1.xlsx",sheet=1,startRow = 1,colNames = 
TRUE,skipEmptyRows = TRUE)
head(datamg,3)

                  Q                                   Themes1     Themes2
1 yes I believe it . Because the risk limits       Nature of risk    <NA>
2 Yes but a very low risk                                   Other    <NA>
3 worried about potential regulations         Regulatory concerns    <NA>

库（openxlsx）
datamg为了将它们映射回原始数据集，您必须向语料库和文档术语矩阵中的每个文档添加唯一标识符。由于您没有行id（或某种唯一键），因此我基于行号创建一个行id，并将其添加到原始数据集中：
library(dplyr)
library(tm)
library(topicmodels)
library(tidytext)

datamg$doc_id <- 1:nrow(datamg)

datamg <- datamg %>% 
  select(doc_id, Q) %>%
  rename('text' = Q)

库（dplyr）
图书馆（tm）
库（topicmodels）
图书馆（tidytext）
datamg$doc\u id%
重命名（'text'=Q）

我只保留这两列，并给它们命名为“doc_id”和“text”，因为将id附加到语料库时，tm包（DataframeSource函数）需要它
myCorpus1 <- Corpus(DataframeSource(datamg))


myCorpus1在datamg$TopicMapped中应该是什么？数据框中有3行，输出中有2行。您没有显示第三行输出吗？@KenS。谢谢你的回复。datamg$TopicMapped意味着将主题1、主题2等分配给相应的行。请忽略主题1和主题2，因为根据我对第1（Q）列内容的理解，我试图手动分配主题。@CPak感谢您的回复。根据我之前的评论，请忽略主题1和主题2。datamd$TopicMapped应该最适合topicmodeling中确定的主题。
         Q                                   Themes1     Themes2       Topic Mapped
    1 yes I believe it . Because the risk limits       Nature of risk    <NA>  
    2 Yes but a very low risk                                   Other    <NA>
    3 worried about potential regulations         Regulatory concerns    <NA>

library(dplyr)
library(tm)
library(topicmodels)
library(tidytext)

datamg$doc_id <- 1:nrow(datamg)

datamg <- datamg %>% 
  select(doc_id, Q) %>%
  rename('text' = Q)

myCorpus1 <- Corpus(DataframeSource(datamg))


document_topic <- as.data.frame(tidy(lda, matrix = "gamma"))
document_topic$document <- as.integer(document_topic$document)

document_topic <- document_topic %>%
  group_by(document) %>%
  top_n(1) %>%
  ungroup()

df_join <- inner_join(datamg, document_topic, by = c("Q" = "document"))