Python 在脚本中只使用一列,但也打印同一索引的另一列
我有一个有两列的熊猫文档。我对第二列中的数据尝试LDA算法,并打印出每个主题的内容。一切正常,我有我的主题和内容输出(只有第二列)。但我希望我的输出与我的主题和超越第二列,第一个也Python 在脚本中只使用一列,但也打印同一索引的另一列,python,pandas,Python,Pandas,我有一个有两列的熊猫文档。我对第二列中的数据尝试LDA算法,并打印出每个主题的内容。一切正常,我有我的主题和内容输出(只有第二列)。但我希望我的输出与我的主题和超越第二列,第一个也 import pandas import numpy as np import pandas as pd from sklearn.feature_extraction.text import CountVectorizer from sklearn.decomposition import LatentDirich
import pandas
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
n_components = 2
n_top_words = 5
def print_top_words(model, feature_names, n_top_words):
out_list = []
for topic_idx, topic in enumerate(model.components_):
message = "%d " % topic_idx #aqui que tem que mudar para consertar a saida
message += " ".join([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]])
out_list.append(message.split())
return out_list
text = pandas.read_csv('listes.csv', encoding = 'utf-8')
text_liste2 = text['liste2']
text_liste1 = text['liste1']
text_liste1_list = text_liste1.values.tolist()
text_liste2_list = text_liste2.values.tolist()
tf_vectorizer = CountVectorizer()
tf = tf_vectorizer.fit_transform(text_liste2_list)
tf_feature_names = tf_vectorizer.get_feature_names()
lda = LatentDirichletAllocation(n_components=n_components, max_iter=5,learning_method='online',learning_offset=50.,random_state=0)
lda.fit(tf)
#print docs par topic - Funciona
document_topics = lda.fit_transform(tf)
topicos = print_top_words(lda, tf_feature_names, n_top_words)
for i in range(len(topicos)):
print("Topic {}:".format(i))
docs = np.argsort(document_topics[:, i])[::-1]
for j in docs[:3]:
print " ".join(text_liste2_list[j].encode('utf-8').split(",")[:2])
数据
liste1,liste2
'hello, how are you','hello'
'I am super intelligent','super intelligent'
'He is a great friend','great friend'
'THE book is on the table','book table'
'the EARTH is in danger','earth danger'
'I just can say goodbye','just goodbye'
'she eats bananas','eats bananas'
'you say goodbye','say goodbye'
我的输出:
Topic 0:
book table
earth danger
just goodbye
eats bananas
Topic 1:
hello
super intelligent
great friend
say goodbye
Topic 0:
'THE book is on the table','book table'
'the EARTH is in danger','earth danger'
'I just can say goodbye','just goodbye'
'she eats bananas','eats bananas
Topic 1:
'hello, how are you','hello'
'I am super intelligent','super intelligent'
'He is a great friend','great friend''
'you say goodbye','say goodbye'
输出良好:
Topic 0:
book table
earth danger
just goodbye
eats bananas
Topic 1:
hello
super intelligent
great friend
say goodbye
Topic 0:
'THE book is on the table','book table'
'the EARTH is in danger','earth danger'
'I just can say goodbye','just goodbye'
'she eats bananas','eats bananas
Topic 1:
'hello, how are you','hello'
'I am super intelligent','super intelligent'
'He is a great friend','great friend''
'you say goodbye','say goodbye'
首先,去掉第一行的逗号,在
你好,你好吗中。
第二,只需打印上次打印中的文本列表[j]
:-):
首先,去掉第一行的逗号,在你好,你好吗中。
第二,只需打印上次打印中的文本列表[j]
:-):