Python 2.7 如何在sklearn库的k-means聚类中使用剪影分数？_Python 2.7_Machine Learning_Scikit Learn_K Means_Silhouette

Python 2.7 如何在sklearn库的k-means聚类中使用剪影分数？

python-2.7 machine-learning scikit-learn

Python 2.7 如何在sklearn库的k-means聚类中使用剪影分数？,python-2.7,machine-learning,scikit-learn,k-means,silhouette,Python 2.7,Machine Learning,Scikit Learn,K Means,Silhouette,我想在我的脚本中使用剪影分数，从sklearn自动计算k-means聚类中的聚类数 import numpy as np import pandas as pd import csv from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score filename = "CSV_BIG.csv" # Read the CSV file with the Pandas lib. path_dir =

我想在我的脚本中使用剪影分数，从sklearn自动计算k-means聚类中的聚类数

import numpy as np
import pandas as pd
import csv
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

filename = "CSV_BIG.csv"

# Read the CSV file with the Pandas lib.
path_dir = ".\\"
dataframe = pd.read_csv(path_dir + filename, encoding = "utf-8", sep = ';' ) # "ISO-8859-1")
df = dataframe.copy(deep=True)

#Use silhouette score
range_n_clusters = list (range(2,10))
print ("Number of clusters from 2 to 9: \n", range_n_clusters)

for n_clusters in range_n_clusters:
    clusterer = KMeans (n_clusters=n_clusters).fit(?)
    preds = clusterer.predict(?)
    centers = clusterer.cluster_centers_

    score = silhouette_score (?, preds, metric='euclidean')
    print ("For n_clusters = {}, silhouette score is {})".format(n_clusters, score)

有人能帮我打问号吗？我不明白用什么代替问号。我从一个示例中获取了代码。

注释部分是上一个版本，我使用固定数量的集群设置为4进行k-means集群。这种方式的代码是正确的，但在我的项目中，我需要自动选择簇的数量。

我假设您将对轮廓进行评分，以获得最佳簇的数量

首先声明一个单独的对象

KMeans

，然后对数据

df

调用它的

fit\u predict

函数，如下所示

for n_clusters in range_n_clusters:
    clusterer = KMeans(n_clusters=n_clusters)
    preds = clusterer.fit_predict(df)
    centers = clusterer.cluster_centers_

    score = silhouette_score(df, preds)
    print("For n_clusters = {}, silhouette score is {})".format(n_clusters, score))

请参阅以获得更清晰的信息。

不幸的是，该架构在单集群数据集方面存在很大问题。因为此度量不负责单个集群问题。如果您的问题仍然存在，您可以尝试