Python Sklearn：找到簇的平均质心位置？_Python_Scikit Learn

Python Sklearn：找到簇的平均质心位置？

python scikit-learn

Python Sklearn：找到簇的平均质心位置？,python,scikit-learn,Python,Scikit Learn,打印主题给我留下了： import pandas as pd, numpy as np, scipy import sklearn.feature_extraction.text as text from sklearn import decomposition descs = ["You should not go there", "We may go home later", "Why should we do your chores", "What should we do"] vec

打印

主题

给我留下了：

import pandas as pd, numpy as np, scipy
import sklearn.feature_extraction.text as text
from sklearn import decomposition

descs = ["You should not go there", "We may go home later", "Why should we do your chores", "What should we do"]

vectorizer = text.CountVectorizer()

dtm = vectorizer.fit_transform(descs).toarray()

vocab = np.array(vectorizer.get_feature_names())

nmf = decomposition.NMF(3, random_state = 1)

topic = nmf.fit_transform(dtm)

它们是

descs

中每个元素属于某个簇的可能性的向量。如何获得每个簇的质心坐标？最后，我想开发一个函数来计算

descs

中每个元素与被分配到的簇的质心之间的距离

是否最好只计算每个簇的每个

descs

元素的

主题值的平均值？
的sklearn.decomposition.NMF
解释如何获得每个簇的质心坐标：
属性：组件：数组，[n\u组件，n\u功能]

数据的非负分量
基本向量按行排列，如以下交互式会话所示：
>>> print(topic)
[0.       , 1.403    , 0.     ],
[0.       , 0.       , 1.637  ],
[1.257    , 0.       , 0.     ],
[0.874    , 0.056    , 0.065  ]

至于你的第二个问题，我看不出“为每个集群计算每个descs
元素的主题值的平均值”有什么意义。在我看来，通过计算的似然度进行分类更有意义。
我假设您创建了三个质心。在nmf.components\uuu
中，每个元素如何表示每个质心的坐标？该数组中非零元素的数量似乎表明高维性。nmf.components\uu的维度是3行14列，对应于3个簇和14个不同的字，也就是说，表示簇质心的向量是词汇表基础的线性组合。那么我如何才能找到质心本身的x-y坐标呢？或者这是一个误入歧途的问题？质心是14维向量，它们不在二维空间中啊，我明白了。矢量中的值表示每个质心的每个标注的坐标。谢谢
In [995]: np.set_printoptions(precision=2)

In [996]: nmf.components_
Out[996]: 
array([[ 0.54,  0.91,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.89,  0.  ,  0.89,  0.37,  0.54,  0.  ,  0.54],
       [ 0.  ,  0.01,  0.71,  0.  ,  0.  ,  0.  ,  0.71,  0.72,  0.71,  0.01,  0.02,  0.  ,  0.71,  0.  ],
       [ 0.  ,  0.01,  0.61,  0.61,  0.61,  0.61,  0.  ,  0.  ,  0.  ,  0.62,  0.02,  0.  ,  0.  ,  0.  ]])