Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jquery/88.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scikit learn 如何在sklearn中获取NMF主题的频率_Scikit Learn_Nmf - Fatal编程技术网

Scikit learn 如何在sklearn中获取NMF主题的频率

Scikit learn 如何在sklearn中获取NMF主题的频率,scikit-learn,nmf,Scikit Learn,Nmf,我现在使用NMF生成主题。我的代码如下所示。但是,我不知道如何获得每个主题的频率。有人能帮我吗?谢谢大家! def fit_tfidf(documents): tfidf = TfidfVectorizer(input = 'content', stop_words = 'english', use_idf = True, ngram_range = NGRAM_RANGE,lowercase = True, max_features = MAX_FEATURES, min_df =

我现在使用NMF生成主题。我的代码如下所示。但是,我不知道如何获得每个主题的频率。有人能帮我吗?谢谢大家!

def fit_tfidf(documents):
    tfidf = TfidfVectorizer(input = 'content', stop_words = 'english', 
use_idf = True, ngram_range = NGRAM_RANGE,lowercase = True, max_features =  MAX_FEATURES, min_df = 1 )
    tfidf_matrix = tfidf.fit_transform(documents.values).toarray()
    tfidf_feature_names = np.array(tfidf.get_feature_names())
    tfidf_reverse_lookup = {word: idx for idx, word in enumerate(tfidf_feature_names)}
    return tfidf_matrix, tfidf_reverse_lookup, tfidf_feature_names

def vectorization(documments):
    if VECTORIZER == 'tfidf':
        vec_matrix, vec_reverse_lookup, vec_feature_names = fit_tfidf(documents) 
    if VECTORIZER == 'bow':
        vec_matrix, vec_reverse_lookup, vec_feature_names = fit_bow(documents)
    return vec_matrix, vec_reverse_lookup, vec_feature_names

def nmf_model(vec_matrix, vec_reverse_lookup, vec_feature_names, NUM_TOPICS):
    topic_words = []
    nmf = NMF(n_components = NUM_TOPICS, random_state=3).fit(vec_matrix)
    for topic in nmf.components_:
        word_idx = np.argsort(topic)[::-1][0:N_TOPIC_WORDS]
        topic_words.append([vec_feature_names[i] for i in word_idx])
    return topic_words

如果您指的是每个文档中每个主题的频率,那么:

H = nmf.fit_transform(vec_matrix)
H是形状矩阵(n_文档,n_主题)。每行表示一个文档向量(在主题空间中)。在这个向量中,您可以找到每个主题的权重(即主题重要性)