R STM：从tm转换为STM文档术语矩阵时如何保存元数据？_R_Matrix_Tm_Topic Modeling_Text Analysis

R STM：从tm转换为STM文档术语矩阵时如何保存元数据？

r matrix

R STM：从tm转换为STM文档术语矩阵时如何保存元数据？,r,matrix,tm,topic-modeling,text-analysis,R,Matrix,Tm,Topic Modeling,Text Analysis,我试图在使用tmpackage准备的文档术语矩阵上运行结构化主题模型（使用stmpackage）我在tm包中构建了一个语料库，其中包含以下元数据： library(tm) myReader2 <- readTabular(mapping=list(content="text", id="id", sentiment = "sentiment")) text_corpus2 <- VCorpus(DataframeSource(bin_stm_df), readerControl

我试图在使用

tm

package准备的文档术语矩阵上运行结构化主题模型（使用

stm

package）

我在

tm

包中构建了一个语料库，其中包含以下元数据：

library(tm)

myReader2 <- readTabular(mapping=list(content="text", id="id", sentiment = "sentiment"))
text_corpus2 <- VCorpus(DataframeSource(bin_stm_df), readerControl = list(reader = myReader2))

meta(text_corpus2[[1]])
  id       : 11
  sentiment: negative
  language : en

到目前为止，一切顺利。但是，当我尝试使用

stm

-兼容数据指定元数据时，元数据消失了：

docsTM <- DTM_st$documents # works fine
vocabTM <- DTM_st$vocab # works fine
metaTM <- DTM_st$meta # returns NULL

> metaTM
NULL

docsTM试试quanteda软件包怎么样
如果无法访问您的对象，我无法保证它可以一字不差地工作，但它应该：
library("quanteda")

# creates the corpus with document variables except for the "text"
text_corpus3 <- corpus(bin_stm_df, text_field = "text")

# convert to document-feature matrix - cleaning options can be added
# see ?tokens
chat_DTM3 <- dfm(text_corpus3)

# similar to tm::removeSparseTerms()
DTM3 <- dfm_trim(chat_DTM3, sparsity = 0.990)

# convert to STM format
DTM_st <- convert(DTM3, to = "stm")

# then it's all there
docsTM <- DTM_st$documents 
vocabTM <- DTM_st$vocab    
metaTM <- DTM_st$meta      # should return the data.frame of document variables

库（“quanteda”）
#使用除“文本”之外的文档变量创建语料库
大家好，我在结尾的时候就知道了，但是谢谢你们在这里发布了很棒的答案！
library("quanteda")

# creates the corpus with document variables except for the "text"
text_corpus3 <- corpus(bin_stm_df, text_field = "text")

# convert to document-feature matrix - cleaning options can be added
# see ?tokens
chat_DTM3 <- dfm(text_corpus3)

# similar to tm::removeSparseTerms()
DTM3 <- dfm_trim(chat_DTM3, sparsity = 0.990)

# convert to STM format
DTM_st <- convert(DTM3, to = "stm")

# then it's all there
docsTM <- DTM_st$documents 
vocabTM <- DTM_st$vocab    
metaTM <- DTM_st$meta      # should return the data.frame of document variables