Python 绘图KMeans聚类和一维数据分类_Python_Matplotlib_Machine Learning_Scikit Learn_K Means

Python 绘图KMeans聚类和一维数据分类

python matplotlib machine-learning scikit-learn

Python 绘图KMeans聚类和一维数据分类,python,matplotlib,machine-learning,scikit-learn,k-means,Python,Matplotlib,Machine Learning,Scikit Learn,K Means,我正在使用KMeans对具有不同特征的三个时间序列数据集进行聚类。出于再现性的原因，我正在共享数据这是我的密码 import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import KMeans protocols = {} types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"} for protname,

我正在使用

KMeans

对具有不同特征的三个时间序列数据集进行聚类。出于再现性的原因，我正在共享数据

这是我的密码

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

protocols = {}

types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}

for protname, fname in types.items():
    col_time,col_window = np.loadtxt(fname,delimiter=',').T
    trailing_window = col_window[:-1] # "past" values at a given index
    leading_window  = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "quotient_times": quotient_times,
        "quotient": quotient,
    }



k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)
k_means.fit(quotient.reshape(-1,1))

将numpy导入为np
将matplotlib.pyplot作为plt导入
从sklearn.cluster导入KMeans
协议={}
类型={“data1”：“data1.csv”、“data2”：“data2.csv”、“data3”：“data3.csv”}
对于protname，fname在types.items（）中：
col_time，col_window=np.loadtxt（fname，delimiter='，'）.T
trailing_window=col_window[：-1]#给定索引处的“过去”值
前导_窗口=列_窗口[1:]#“给定索引处的当前值
递减索引=np.where（前导窗口<尾随窗口）[0]
商=前导窗口[减少索引]/尾随窗口[减少索引]
商乘=列乘时间[递减]
协议[protname]={
“col_time”：col_time，
“col_window”：col_window，
“商_次”：商_次，
“商”：商，
}
k_means=KMeans（算法='auto'，copy_x=True，init='k-means++'，max_iter=300，
n_clusters=3，n_init=10，n_jobs=None，precompute_distance='auto'，
随机（状态=0，tol=0.0001，详细=0）
k_表示拟合（商重塑（-1,1））

这样，给定一个新的数据点（具有

商

和

商_次

），我想通过构建每个数据集，将这两个转换的特征

商

和

商_次

与

KMeans

叠加，来知道它属于哪个

集群
k_意味着。标签
给出这个输出数组（[1,1,0,1,2,1,0,0,0,2,0,0,0,1,0,0,0,0]，dtype=int32）

最后，我想使用plt.plot（k_的意思是“.”，color=“blue”）
来可视化集群，但我得到了这个错误：TypeError:float（）参数必须是字符串或数字，而不是“KMeans”
。如何绘制KMeans
群集？如果我正确理解了您想要绘制的是KMeans结果的边界决定。
您可以在scikit lean网站上找到一个如何做到这一点的示例
上面的例子甚至在做PCA，这样数据可以在2D中可视化（如果数据维度高于2），这对你来说是不相关的
您可以通过Kmeans决策轻松绘制散点颜色，以便更好地了解聚类哪里出错。如果我正确理解，您要绘制的是Kmeans结果的边界决策。
您可以在scikit lean网站上找到一个如何做到这一点的示例
上面的例子甚至在做PCA，这样数据可以在2D中可视化（如果数据维度高于2），这对你来说是不相关的
通过Kmeans决策，您可以轻松绘制散点颜色，以便更好地了解聚类哪里出了问题。
您有效地寻找的是一系列值，这些值之间的点被认为是在给定的类中。使用Kmeans以这种方式对1d数据进行分类是非常不寻常的，尽管它确实有效正如您所注意到的，为了使用该方法，您需要将输入数据转换为2d数组
k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)

quotient_2d = quotient.reshape(-1,1)
k_means.fit(quotient_2d)

在以后的分类（预测）步骤中，您将再次需要商_2d

首先我们可以绘制质心，因为数据是1d，x轴点是任意的
colors = ['r','g','b']
centroids = k_means.cluster_centers_
for n, y in enumerate(centroids):
    plt.plot(1, y, marker='x', color=colors[n], ms=10)
plt.title('Kmeans cluster centroids')

这将生成以下绘图

要获得点的群集成员资格，请将商_2d
传递到.predict
。这将返回类成员资格的数字数组，例如
>>> Z = k_means.predict(quotient_2d)
>>> Z
array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)

我们可以使用它来过滤原始数据，以单独的颜色绘制每个类
# Plot each class as a separate colour
n_clusters = 3 
for n in range(n_clusters):
    # Filter data points to plot each in turn.
    ys = quotient[ Z==n ]
    xs = quotient_times[ Z==n ]

    plt.scatter(xs, ys, color=colors[n])

plt.title("Points by cluster")

这将使用原始数据生成以下绘图，每个点由簇成员资格着色
您实际上要寻找的是一系列值，在这些值之间的点被认为是给定类中的点。使用KMeans以这种方式对1d数据进行分类是非常不寻常的，尽管它确实有效。正如您所注意到的，您需要将输入数据转换为2d数组才能使用该方法
k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)

quotient_2d = quotient.reshape(-1,1)
k_means.fit(quotient_2d)

在以后的分类（预测）步骤中，您将再次需要商_2d

首先我们可以绘制质心，因为数据是1d，x轴点是任意的
colors = ['r','g','b']
centroids = k_means.cluster_centers_
for n, y in enumerate(centroids):
    plt.plot(1, y, marker='x', color=colors[n], ms=10)
plt.title('Kmeans cluster centroids')

这将生成以下绘图

要获得点的群集成员资格，请将商_2d
传递到.predict
。这将返回类成员资格的数字数组，例如
>>> Z = k_means.predict(quotient_2d)
>>> Z
array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)

我们可以使用它来过滤原始数据，以单独的颜色绘制每个类
# Plot each class as a separate colour
n_clusters = 3 
for n in range(n_clusters):
    # Filter data points to plot each in turn.
    ys = quotient[ Z==n ]
    xs = quotient_times[ Z==n ]

    plt.scatter(xs, ys, color=colors[n])

plt.title("Points by cluster")

这将使用原始数据生成以下绘图，每个点由簇成员资格着色
你不想绘制KMeans
类，对吧？而是一些数字。但是你想绘制什么数字？预测？聚类中心？我想有两个图1）预测和2）KMeans

类。你不想绘制

KMeans

类，对吗？而是一些数字。但是什么数字呢你想绘图吗？一个预测？群集中心？我想有两个绘图1）预测和2）KMeans类。我试过了，但得到了相同的错误。对你有效吗？我试过了，但得到了相同的错误。对你有效吗？