Python 利用余弦_相似性获取nltk均值聚类的惯性_Python_Nltk_K Means

Python 利用余弦_相似性获取nltk均值聚类的惯性

python

Python 利用余弦_相似性获取nltk均值聚类的惯性,python,nltk,k-means,Python,Nltk,K Means,我将nltk用于k均值聚类，因为我想更改距离度量。nltk的意思是有一个类似于sklearn的惯性吗？在他们的文档或网上找不到下面的代码是人们通常如何使用SKK方法找到惯性 inertia = [] for n_clusters in range(2, 26, 1): clusterer = KMeans(n_clusters=n_clusters) preds = clusterer.fit_predict(features) centers = clusterer.cluste

我将nltk用于k均值聚类，因为我想更改距离度量。nltk的意思是有一个类似于sklearn的惯性吗？在他们的文档或网上找不到

下面的代码是人们通常如何使用SKK方法找到惯性

inertia = []
for n_clusters in range(2, 26, 1):
  clusterer = KMeans(n_clusters=n_clusters)
  preds = clusterer.fit_predict(features)
  centers = clusterer.cluster_centers_
  inertia.append(clusterer.inertia_)

plt.plot([i for i in range(2,26,1)], inertia, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()

您可以编写自己的函数来获取nltk中Kmeanscluster的惯性

根据您发布的问题。使用相同的虚拟数据，如下所示。在制造2个集群之后。。

参考文档，惯性是样本到其最近聚类中心的平方距离之和

 feature_matrix = df[['feature1','feature2','feature3']].to_numpy()
 centroid = df['centroid'].to_numpy()

 def nltk_inertia(feature_matrix, centroid):
     sum_ = []
     for i in range(feature_matrix.shape[0]):
         sum_.append(np.sum((feature_matrix[i] - centroid[i])**2))  #here implementing inertia as given in the docs of scikit i.e sum of squared distance..

     return sum(sum_)

 nltk_inertia(feature_matrix, centroid)
 #op 27.495250000000002

 #now using kmeans clustering for feature1, feature2, and feature 3 with same number of cluster 2

scikit_kmeans = KMeans(n_clusters= 2)
scikit_kmeans.fit(vectors)  # vectors = [np.array(f) for f in df.values]  which contain feature1, feature2, feature3
scikit_kmeans.inertia_
#op
27.495250000000006

嗨@qaiser，你知道我如何改变k均值到堪培拉距离的距离度量吗？谢谢大家!@atjw94 u可以对此提出一个新问题，只需稍作解释。。。给你，谢谢！