Machine learning 在朴素贝叶斯中，我如何指出某些特性（单词）和某些文档比其他特性（单词）和文档更重要？_Machine Learning_Scikit Learn_Naivebayes

Machine learning 在朴素贝叶斯中，我如何指出某些特性（单词）和某些文档比其他特性（单词）和文档更重要？

machine-learning scikit-learn

Machine learning 在朴素贝叶斯中，我如何指出某些特性（单词）和某些文档比其他特性（单词）和文档更重要？,machine-learning,scikit-learn,naivebayes,Machine Learning,Scikit Learn,Naivebayes,我正在使用sklearn开发一个二进制文档分类器。我想指出，某些特征（单词）对学习更重要（或不重要），而某些文档对学习更重要（或不重要）如果使用，可以获得文本语料库中每个单词的出现次数（频率）： vec = CountVectorizer().fit(corpus) bag_of_words = vec.transform(corpus) sum_words = bag_of_words.sum(axis=0) words_freq = [(word, sum_words[0, idx])

我正在使用sklearn开发一个二进制文档分类器。我想指出，某些特征（单词）对学习更重要（或不重要），而某些文档对学习更重要（或不重要）

如果使用，可以获得文本语料库中每个单词的出现次数（频率）：

vec = CountVectorizer().fit(corpus)
bag_of_words = vec.transform(corpus)
sum_words = bag_of_words.sum(axis=0) 
words_freq = [(word, sum_words[0, idx]) for word, idx in     vec.vocabulary_.items()]
words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)

您还可以使用以下方法基于单变量统计测试获得每个特征（单词）的重要性：

通过训练随机分类器：

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier()

model = clf.fit(X, y)

# Calculate feature importances
importances = model.feature_importances_

如果你能展示你迄今为止所做的一切，那就太好了。

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier()

model = clf.fit(X, y)

# Calculate feature importances
importances = model.feature_importances_