Python Gensim中id2word\u令牌2ID使用混乱
为了明确起见,我想得到您的反馈,以下代码/gensim使用是否正确 提前感谢您宝贵的时间Python Gensim中id2word\u令牌2ID使用混乱,python,python-2.7,python-3.x,gensim,Python,Python 2.7,Python 3.x,Gensim,为了明确起见,我想得到您的反馈,以下代码/gensim使用是否正确 提前感谢您宝贵的时间 import gensim train = ["John likes to watch movies Mary likes movies too" , "John also likes to watch football games" ] test = ["Football is my dream"] train_texts = [[word for word in docu
import gensim
train = ["John likes to watch movies Mary likes movies too" ,
"John also likes to watch football games" ]
test = ["Football is my dream"]
train_texts = [[word for word in document.lower().split()] for document in train]
test_texts = [[word for word in document.lower().split()] for document in test]
dictionary =gensim.corpora.Dictionary(train_texts)
train_corpus = [dictionary.doc2bow(text) for text in train_texts]
test_corpus = [dictionary.doc2bow(text) for text in test_texts]
ldaModel = gensim.models.LdaModel(corpus=train_corpus ,
id2word=dictionary , num_topics=2)
bound_perplex = ldaModel.bound(test_corpus)
代码的用法绝对正确,但对于较大的文档,最好使用语料库流 您可以在此处获得有关数据流的更多信息-
我和其他人一起调查。这是应该的。非常感谢。