
如何将LDA模型应用于一组新文档,lda,topic-modeling,Lda,Topic Modeling,我使用下面的LDA模型获取1000个文档的主题


`#creation de count vectorizer pour l entree de lda
 vectorizer = CountVectorizer(analyzer='word',       
                         min_df=7,                        # minimum reqd occurences of a word 
                         max_df=80,                        # maximum reqd occurences of a word 
                         stop_words='english',             # remove stop words
                         lowercase=True,                   # convert all words to lowercase
                         token_pattern='[a-zA-Z0-9]{3,}',  # num chars > 3
                         # max_features=50000,             # max number of uniq words

data_vectorized = vectorizer.fit_transform(data_lemmatized)
# Materialize the sparse data
data_dense = data_vectorized.todense()

# Build LDA Model
lda_model = LatentDirichletAllocation(n_components=14,               # Number of topics
                                  max_iter=10,               # Max learning iterations
                                  random_state=100,          # Random state
                                  batch_size=128,            # n docs in each learning iter
                                  evaluate_every = -1,       
                                  n_jobs = -1,               # Use all available CPUs
lda_output = lda_model.fit_transform(data_vectorized)
lda_output = lda_model.transform(data_vectorized)`