Scikit learn 如何在以下tfidf模型中获得最具代表性的功能?

Scikit learn 如何在以下tfidf模型中获得最具代表性的功能?,scikit-learn,tf-idf,Scikit Learn,Tf Idf,您好,我有以下列表: listComments = ["comment1","comment2","comment3",...,"commentN"] 我创建了一个tfidf矢量器,以从我的评论中获取模型,如下所示: tfidf_vectorizer = TfidfVectorizer(min_df=10,ngram_range=(1,3),analyzer='word') tfidf = tfidf_vectorizer.fit_transform(listComments) 现在,为了更

您好,我有以下列表:

listComments = ["comment1","comment2","comment3",...,"commentN"]
我创建了一个tfidf矢量器,以从我的评论中获取模型,如下所示:

tfidf_vectorizer = TfidfVectorizer(min_df=10,ngram_range=(1,3),analyzer='word')
tfidf = tfidf_vectorizer.fit_transform(listComments)
现在,为了更好地了解我的模型,我想获得最具代表性的功能,我尝试了:

print("these are the features :",tfidf_vectorizer.get_feature_names())
print("the vocabulary :",tfidf_vectorizer.vocabulary_)
这给了我一个单词列表,我认为我的模型用于矢量化:

these are the features : ['10', '10 days', 'red', 'car',...]

the vocabulary : {'edge': 86, 'local': 96, 'machine': 2,...}

然而,我想找到一种方法来获得30个最具代表性的特征,我指的是在我的tfidf模型中达到最高值的单词,逆频率最高的单词,我正在阅读文档,但我无法找到这种方法。我真的非常感谢在这个问题上的帮助,提前感谢,

如果您想获得与idf分数相关的词汇表,可以使用
idf\uu
属性和
argsort
it

# create an array of feature names
feature_names = np.array(tfidf_vectorizer.get_feature_names())

# get order
idf_order = tfidf_vectorizer.idf_.argsort()[::-1]

# produce sorted idf word
feature_names[idf_order]
如果您想获得每个文档的tfidf分数的排序列表,您可以执行类似的操作

# get order for all documents based on tfidf scores
tfidf_order = tfidf.toarray().argsort()[::-1]

# produce words
feature_names[tfidf_order]

如果您想获得与idf分数相关的词汇表列表,可以使用
idf_u
属性和
argsort
it

# create an array of feature names
feature_names = np.array(tfidf_vectorizer.get_feature_names())

# get order
idf_order = tfidf_vectorizer.idf_.argsort()[::-1]

# produce sorted idf word
feature_names[idf_order]
如果您想获得每个文档的tfidf分数的排序列表,您可以执行类似的操作

# get order for all documents based on tfidf scores
tfidf_order = tfidf.toarray().argsort()[::-1]

# produce words
feature_names[tfidf_order]