Python kmeans的弯头法_Python_Machine Learning_Cluster Analysis

Python kmeans的弯头法

python machine-learning

Python kmeans的弯头法,python,machine-learning,cluster-analysis,Python,Machine Learning,Cluster Analysis,我正在做一个聚类任务，我使用了获得最佳聚类数（k），但是我得到了一个线性图，我无法从图中确定k。 [在此处输入图像描述][2] 多谢各位我建议您使用轮廓分数来确定簇的数量，它不需要您查看绘图，并且可以完全自动-只需尝试不同的k值，然后选择轮廓分数最小的值：然而，在这种特殊情况下，这看起来并不能解决您的问题。如果数据点在空间上分布相当均匀，这意味着它们实际上没有形成任何簇，那么就没有最佳k值。查看此处的最后一行作为示例： k意味着从技术上讲，确实创建了不同的集群，但它们并不像您希望的那

我正在做一个聚类任务，我使用了获得最佳聚类数（k），但是我得到了一个线性图，我无法从图中确定k。 [在此处输入图像描述][2]

多谢各位

我建议您使用轮廓分数来确定簇的数量，它不需要您查看绘图，并且可以完全自动-只需尝试不同的k值，然后选择轮廓分数最小的值：

然而，在这种特殊情况下，这看起来并不能解决您的问题。如果数据点在空间上分布相当均匀，这意味着它们实际上没有形成任何簇，那么就没有最佳k值。查看此处的最后一行作为示例：

k意味着从技术上讲，确实创建了不同的集群，但它们并不像您希望的那样彼此分离。在这种情况下，将没有最小轮廓分数，肘部方法将不起作用。在您的案例中可能就是这样，数据中没有真正的集群

There are many ways to do this kind of thing.  For one thing, you can use Yellowbrick to do the work.


import pandas as pd
import matplotlib as mpl 
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn import datasets

from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer

mpl.rcParams["figure.figsize"] = (9,6)

# Load iris flower dataset
iris = datasets.load_iris()

X = iris.data #clustering is unsupervised learning hence we load only X(i.e.iris.data) and not Y(i.e. iris.target)
# Converting the data into dataframe
feature_names = iris.feature_names
iris_dataframe = pd.DataFrame(X, columns=feature_names)
iris_dataframe.head(10)

# Fitting the model with a dummy model, with 3 clusters (we already know there are 3 classes in the Iris dataset)
k_means = KMeans(n_clusters=3)
k_means.fit(X)

# Plotting a 3d plot using matplotlib to visualize the data points
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111, projection='3d')

# Setting the colors to match cluster results
colors = ['red' if label == 0 else 'purple' if label==1 else 'green' for label in k_means.labels_]

ax.scatter(X[:,3], X[:,0], X[:,2], c=colors)

有关更多信息，请参见下面的链接

太有用了！非常感谢，非常感谢！

# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(2,11))

visualizer.fit(X)    # Fit the data to the visualizer
visualizer.show()    # Draw/show/show the data