R STM:从tm转换为STM文档术语矩阵时如何保存元数据?
我试图在使用R STM:从tm转换为STM文档术语矩阵时如何保存元数据?,r,matrix,tm,topic-modeling,text-analysis,R,Matrix,Tm,Topic Modeling,Text Analysis,我试图在使用tmpackage准备的文档术语矩阵上运行结构化主题模型(使用stmpackage) 我在tm包中构建了一个语料库,其中包含以下元数据: library(tm) myReader2 <- readTabular(mapping=list(content="text", id="id", sentiment = "sentiment")) text_corpus2 <- VCorpus(DataframeSource(bin_stm_df), readerControl
tm
package准备的文档术语矩阵上运行结构化主题模型(使用stm
package)
我在tm
包中构建了一个语料库,其中包含以下元数据:
library(tm)
myReader2 <- readTabular(mapping=list(content="text", id="id", sentiment = "sentiment"))
text_corpus2 <- VCorpus(DataframeSource(bin_stm_df), readerControl = list(reader = myReader2))
meta(text_corpus2[[1]])
id : 11
sentiment: negative
language : en
到目前为止,一切顺利。但是,当我尝试使用stm
-兼容数据指定元数据时,元数据消失了:
docsTM <- DTM_st$documents # works fine
vocabTM <- DTM_st$vocab # works fine
metaTM <- DTM_st$meta # returns NULL
> metaTM
NULL
docsTM试试quanteda软件包怎么样
如果无法访问您的对象,我无法保证它可以一字不差地工作,但它应该:
library("quanteda")
# creates the corpus with document variables except for the "text"
text_corpus3 <- corpus(bin_stm_df, text_field = "text")
# convert to document-feature matrix - cleaning options can be added
# see ?tokens
chat_DTM3 <- dfm(text_corpus3)
# similar to tm::removeSparseTerms()
DTM3 <- dfm_trim(chat_DTM3, sparsity = 0.990)
# convert to STM format
DTM_st <- convert(DTM3, to = "stm")
# then it's all there
docsTM <- DTM_st$documents
vocabTM <- DTM_st$vocab
metaTM <- DTM_st$meta # should return the data.frame of document variables
库(“quanteda”)
#使用除“文本”之外的文档变量创建语料库
大家好,我在结尾的时候就知道了,但是谢谢你们在这里发布了很棒的答案!
library("quanteda")
# creates the corpus with document variables except for the "text"
text_corpus3 <- corpus(bin_stm_df, text_field = "text")
# convert to document-feature matrix - cleaning options can be added
# see ?tokens
chat_DTM3 <- dfm(text_corpus3)
# similar to tm::removeSparseTerms()
DTM3 <- dfm_trim(chat_DTM3, sparsity = 0.990)
# convert to STM format
DTM_st <- convert(DTM3, to = "stm")
# then it's all there
docsTM <- DTM_st$documents
vocabTM <- DTM_st$vocab
metaTM <- DTM_st$meta # should return the data.frame of document variables