Python 如何在LDA中查看每个主题的所有文档？_Python_Python 3.x_Scikit Learn_Lda_Topic Modeling

Python 如何在LDA中查看每个主题的所有文档？

python python-3.x scikit-learn

Python 如何在LDA中查看每个主题的所有文档？,python,python-3.x,scikit-learn,lda,topic-modeling,Python,Python 3.x,Scikit Learn,Lda,Topic Modeling,我使用LDA来了解一篇优秀文章的主题。我设法打印了主题，但我想打印每个文本与您的主题数据： it's very hot outside summer there are not many flowers in winter in the winter we eat hot food in the summer we go to the sea in winter we used many clothes in summer we are on vacation winter and summe

我使用LDA来了解一篇优秀文章的主题。我设法打印了主题，但我想打印每个文本与您的主题

数据：

it's very hot outside summer
there are not many flowers in winter
in the winter we eat hot food
in the summer we go to the sea
in winter we used many clothes
in summer we are on vacation
winter and summer are two seasons of the year

Topic 1
it's very hot outside summer
in the summer we go to the sea
in summer we are on vacation
winter and summer are two seasons of the year

Topic 2
there are not many flowers in winter
in the winter we eat hot food
in winter we used many clothes
winter and summer are two seasons of the year

我尝试使用sklearn，我可以打印主题，但我想打印属于每个主题的所有短语

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import numpy as np
import pandas

dataset = pandas.read_csv('data.csv', encoding = 'utf-8')
comments = dataset['comments']
comments_list = comments.values.tolist()

vect = CountVectorizer()
X = vect.fit_transform(comments_list)

lda = LatentDirichletAllocation(n_topics = 2, learning_method = "batch", max_iter = 25, random_state = 0)

document_topics = lda.fit_transform(X)

sorting = np.argsort(lda.components_, axis = 1)[:, ::-1]
feature_names = np.array(vect.get_feature_names())

docs = np.argsort(comments_list[:, 1])[::-1]
for i in docs[:4]:
    print(' '.join(i) + '\n')

输出良好：

it's very hot outside summer
there are not many flowers in winter
in the winter we eat hot food
in the summer we go to the sea
in winter we used many clothes
in summer we are on vacation
winter and summer are two seasons of the year

Topic 1
it's very hot outside summer
in the summer we go to the sea
in summer we are on vacation
winter and summer are two seasons of the year

Topic 2
there are not many flowers in winter
in the winter we eat hot food
in winter we used many clothes
winter and summer are two seasons of the year

如果要打印文档，需要指定它们

print(" ".join(comments_list[i].split(",")[:2]) + "\n")

你有文档，每个文档都有文档主题。因此，只需迭代您的document\u topics变量，并使用字典存储主题和索引，例如。谢谢@Norhther，所以我应该这样做：for I in document\u topics？document\u topics为您的每个文档都有一个主题。因此，您可以使用for来存储索引。列表字典可以做这项工作，列表存储索引，键是主题。对不起，如果我理解正确，我不能这样做。我希望有一个文本形式的文档输出及其主题。如果我按照你说的做，我会有一份数字形式的文件及其主题：(