Python Gensim：如何保存LDA模型&x27；是否将生成的主题转换为可读格式（csv、txt等）？_Python_Lda_Gensim

Python Gensim：如何保存LDA模型&x27；是否将生成的主题转换为可读格式（csv、txt等）？

python

Python Gensim：如何保存LDA模型&x27；是否将生成的主题转换为可读格式（csv、txt等）？,python,lda,gensim,Python,Lda,Gensim,守则的最后部分： lda = LdaModel(corpus=corpus,id2word=dictionary, num_topics=2) print lda bash输出： INFO : adding document #0 to Dictionary(0 unique tokens) INFO : built Dictionary(18 unique tokens) from 5 documents (total 20 corpus positions) INFO : using s

守则的最后部分：

lda = LdaModel(corpus=corpus,id2word=dictionary, num_topics=2)
print lda

bash输出：

INFO : adding document #0 to Dictionary(0 unique tokens)
INFO : built Dictionary(18 unique tokens) from 5 documents (total  20 corpus positions)
INFO : using serial LDA version on this node
INFO : running online LDA training, 2 topics, 1 passes over the supplied corpus of 5 documents, updating model once every 5 documents
WARNING : too few updates, training might not converge; consider increasing the number of passes to improve accuracy
INFO : PROGRESS: iteration 0, at document #5/5
INFO : 2/5 documents converged within 50 iterations
INFO : topic #0: 0.079*cute + 0.076*broccoli + 0.070*adopted + 0.069*yesterday + 0.069*eat + 0.069*sister + 0.068*kitten + 0.068*kittens + 0.067*bananas + 0.067*chinchillas
INFO : topic #1: 0.082*broccoli + 0.079*cute + 0.071*piece + 0.070*munching + 0.069*spinach + 0.068*hamster + 0.068*ate + 0.067*banana + 0.066*breakfast + 0.066*smoothie
INFO : topic diff=0.470477, rho=1.000000
<gensim.models.ldamodel.LdaModel object at 0x10f1f4050>

INFO:将文档#0添加到字典（0个唯一标记）
信息：从5个文档（总共20个语料库位置）构建字典（18个唯一标记）
信息：在此节点上使用串行LDA版本
信息：运行在线LDA培训，2个主题，1个通过提供的5个文档的语料库，每5个文档更新一次模型
警告：更新太少，培训可能无法收敛；考虑增加传球次数以提高准确性
信息：进度：迭代0，见文件#5/5
信息：2/5的文档在50次迭代中聚合
信息：主题#0:0.079*可爱+0.076*西兰花+0.070*被收养+0.069*昨天+0.069*吃+0.069*姐妹+0.068*小猫+0.068*小猫+0.067*香蕉+0.067*栗鼠
信息：主题1:0.082*西兰花+0.079*可爱+0.071*片+0.070*咀嚼+0.069*菠菜+0.068*仓鼠+0.068*吃了+0.067*香蕉+0.066*早餐+0.066*冰沙
信息：主题差异=0.470477，rho=1.000000

所以我想知道我是否能够将它生成的主题保存为可读的格式。我尝试了

.save（）

方法，但它总是输出一些不可读的内容

您可以使用

pickle

模块

import pickle
# your code
pickle.dump(lda,open(filename,'w'))
# you may load it back again
lda_copy = pickle.load(file(filename))

您只需要使用lda.show_topics（topics=-1）或任何数量的您想要的主题（topics=10，topics=15，topics=1000…）。我通常只是：

logfile = open('.../yourfile.txt', 'a')
print>>logfile, lda.show_topics(topics=-1, topn=10)

所有这些参数和其他参数均可在gensim中获得

以下是如何为gensim LDA保存模型：

from gensim import corpora, models, similarities

# create corpus and dictionary
corpus = ...
dictionary = ...

# train model, this might takes time
model = models.LdaModel.LdaModel(corpus=corpus,id2word=dictionary, num_topics=200,passes=5, alpha='auto')
# save model to disk (no need to use pickle module)
model.save('lda.model')

要打印主题，以下是几种方法：

# later on, load trained model from file
model =  models.LdaModel.load('lda.model')

# print all topics
model.show_topics(topics=200, topn=20)

# print topic 28
model.print_topic(109, topn=20)

# another way
for i in range(0, model.num_topics-1):
    print model.print_topic(i)

# and another way, only prints top words
for t in range(0, model.num_topics-1):
    print 'topic {}: '.format(t) + ', '.join([v[1] for v in model.show_topic(t, 20)])

.save（）

为您提供模型本身，而不是主题（因此，输出不可读）

使用：

with open('topic_file', 'w') as topic_file:
    topics=lda_model.top_topics(corpus)
    topic_file.write('\n'.join('%s %s' %topic for topic in topics))

您将获得所有集群主题的可读文件及其相关概率。

注意，

pickle

通常编写一个文本文件，虽然可读，但可能无法理解。argh。是的，我刚看到结果。你知道有什么方法可以只从包中提取主题，这样生成的文本文件就更容易擦洗了吗？对不起，我不知道有什么方法。pickle不起作用，因为它会保存整个模型，而不是主题词……你试过regex吗？我面对同样的事情，注意到每一项都像一根绳子。