Python 在二维以上数据上绘制kmeans聚类_Python_Python 3.x_Pandas_Matplotlib_Plot

Python 在二维以上数据上绘制kmeans聚类

python python-3.x pandas matplotlib plot

Python 在二维以上数据上绘制kmeans聚类,python,python-3.x,pandas,matplotlib,plot,Python,Python 3.x,Pandas,Matplotlib,Plot,我有一个6列的数据集，在使用KMEANs后，我需要在聚类后可视化绘图。我有六个集群。我怎么做？这是我的Kmeans集群代码： from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaled_features = scaler.fit_transform(psnr_bitrate) kmeans = KMeans(init="random",n_clusters=6,n_ini

我有一个6列的数据集，在使用KMEANs后，我需要在聚类后可视化绘图。我有六个集群。我怎么做？这是我的Kmeans集群代码：

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_features = scaler.fit_transform(psnr_bitrate)
kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)
y_kmeans = kmeans.predict(scaled_features)

我在这个链接上找到了另一个帖子：但我无法理解解决方案，因为我不知道是什么

cluster

用那个密码

我使用了以下代码：

from sklearn.preprocessing import StandardScaler
from sklearn import cluster

scaler = StandardScaler()
scaled_features = scaler.fit_transform(psnr_bitrate)
kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)
y_kmeans = kmeans.predict(scaled_features)
scaled_features['cluster'] = y_kmeans
pd.tools.plotting.parallel_coordinates(scaled_features, 'cluster')

它会产生这样的错误：

Traceback (most recent call last):

  File "<ipython-input-77-2e66d8a57100>", line 7, in <module>
    scaled_features['cluster'] = y_kmeans

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

它有6列和1301行。但是我的列没有名称。

scaled\u features

是一个numpy数组，不能用字符串索引数组。您需要首先使用以下命令将其转换为数据帧：

scaled\u features=pd.DataFrame（scaled\u features）

对于较新版本的，在几个点上，应该是

pd.plotting.parallel_坐标，如果将预测值设置为数据帧，则更容易，例如：
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn import datasets
from sklearn.decomposition import PCA

# import some data to play with
X = iris.data
y = iris.target

scaler = StandardScaler()
scaled_features = pd.DataFrame(scaler.fit_transform(X))

如果可以，请提供列名：
scaled_features.columns = iris.feature_names

Kmeans和分配群集：
kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)

scaled_features['cluster'] = kmeans.predict(scaled_features)

绘图：

或者对特征和绘图进行一些降维：
from sklearn.manifold import MDS
import seaborn as sns

embedding = MDS(n_components=2)
mds = pd.DataFrame(embedding.fit_transform(scaled_features.drop('cluster',axis=1)),
             columns = ['component1','component2'])
mds['cluster'] = kmeans.predict(scaled_features.drop('cluster',axis=1))

sns.scatterplot(data=mds,x = "component1",y="component2",hue="cluster")

cluster
中的代码对应于sklearn import cluster中的
否，我认为这不是真的。因为在答案代码中，我们有这样的代码：从sklearn.preprocessing导入StandardScaler scaler=StandardScaler（）scaled\u features=scaler.fit\u transform（psnr\u比特率）kmeans=kmeans（init=“random”，n\u clusters=6，n\u init=10，max\u iter=300，random\u state=42）kmeans.fit（scaled\u features）y\u kmeans=kmeans.predict（scaled\u features）scaled\u features['cluster']=y_kmeans pd.tools.plotting.parallel_坐标（缩放特征，'cluster'）和cluster用作列I thinkYes，字符串“cluster”用作解决方案中数据帧的列名。我仍然不明白你不理解的东西…我使用了上面的代码，使用集群产生了一个错误。请看上面我添加的新代码。太好了。谢谢。这是一种在这些图上用平行坐标显示簇中心的方法吗？你能告诉我在你上面画的图上有没有显示簇中心的方法？因为您使用这个pd.plotting.parallel_坐标（缩放的_特征，'簇'）进行打印，我不知道如何在这个图上显示簇中心。我可以稍后再试，现在正忙于工作
pd.plotting.parallel_coordinates(scaled_features, 'cluster')

from sklearn.manifold import MDS
import seaborn as sns

embedding = MDS(n_components=2)
mds = pd.DataFrame(embedding.fit_transform(scaled_features.drop('cluster',axis=1)),
             columns = ['component1','component2'])
mds['cluster'] = kmeans.predict(scaled_features.drop('cluster',axis=1))

sns.scatterplot(data=mds,x = "component1",y="component2",hue="cluster")