Python 3.x LDA模型：为什么是“主题”；字；数字？_Python 3.x_Lda_Mallet

Python 3.x LDA模型：为什么是“主题”；字；数字？

python-3.x

Python 3.x LDA模型：为什么是“主题”；字；数字？,python-3.x,lda,mallet,Python 3.x,Lda,Mallet,我有一套三叉图（见）。列名是三元组；每个单元格代表一个文档；单元格条目将事件命名为（二进制）然后，我对三叉图进行预处理，并使用下面的代码训练LDA模型。然而，作为LDA Mallet的新手，我做错了一些事情——从wordcloud打印的“单词”只是数字。我迷路了，无法找出单词与数字表示的连接在哪里丢失/如何恢复 with open('small_trigrams.pkl', 'rb') as file: small_trigrams = pickle.load(file) small

我有一套三叉图（见）。列名是三元组；每个单元格代表一个文档；单元格条目将事件命名为（二进制）

然后，我对三叉图进行预处理，并使用下面的代码训练LDA模型。然而，作为LDA Mallet的新手，我做错了一些事情——从wordcloud打印的“单词”只是数字。我迷路了，无法找出单词与数字表示的连接在哪里丢失/如何恢复

with open('small_trigrams.pkl', 'rb') as file:
    small_trigrams = pickle.load(file)

small_mydict = gensim.corpora.Dictionary()    
small_trigrams_collection = []

for col in small_trigrams.columns:
    trigram = col.replace("(", "").replace("'", "").replace(" ", "").replace(")", "").strip().split(",", 3)
    value = small_trigrams[col].sum() # trigram occurrences
    for i in range(int(value)):
        small_trigrams_collection.append(trigram)            
small_mycorp = [small_mydict.doc2bow(trigram, allow_update=True) for trigram in small_trigrams_collection] # create corpus


# Train LDA on the trigrams features, assess topic coherence
small_topics_coherence = {} # dict with topics: coherence score
small_models = {} # collection of models

# train LDA on trigrams features
model = LdaMallet(path_to_mallet_binary,corpus=small_mycorp, num_topics=i, id2word=small_mydict) # train model

for t in range(model.num_topics)[:6]:
    plt.figure()
    plt.imshow(WordCloud().fit_words(dict(lda.show_topic(t, 200))))
    plt.axis("off")
    plt.title("Topic #" + str(t))
    plt.show()

有人能指出我的错误吗