Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/315.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在二维以上数据上绘制kmeans聚类_Python_Python 3.x_Pandas_Matplotlib_Plot - Fatal编程技术网

Python 在二维以上数据上绘制kmeans聚类

Python 在二维以上数据上绘制kmeans聚类,python,python-3.x,pandas,matplotlib,plot,Python,Python 3.x,Pandas,Matplotlib,Plot,我有一个6列的数据集,在使用KMEANs后,我需要在聚类后可视化绘图。我有六个集群。我怎么做? 这是我的Kmeans集群代码: from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaled_features = scaler.fit_transform(psnr_bitrate) kmeans = KMeans(init="random",n_clusters=6,n_ini

我有一个6列的数据集,在使用KMEANs后,我需要在聚类后可视化绘图。我有六个集群。我怎么做? 这是我的Kmeans集群代码:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_features = scaler.fit_transform(psnr_bitrate)
kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)
y_kmeans = kmeans.predict(scaled_features)
我在这个链接上找到了另一个帖子: 但我无法理解解决方案,因为我不知道是什么

cluster
用那个密码

我使用了以下代码:

from sklearn.preprocessing import StandardScaler
from sklearn import cluster

scaler = StandardScaler()
scaled_features = scaler.fit_transform(psnr_bitrate)
kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)
y_kmeans = kmeans.predict(scaled_features)
scaled_features['cluster'] = y_kmeans
pd.tools.plotting.parallel_coordinates(scaled_features, 'cluster')
它会产生这样的错误:

Traceback (most recent call last):

  File "<ipython-input-77-2e66d8a57100>", line 7, in <module>
    scaled_features['cluster'] = y_kmeans

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

它有6列和1301行。但是我的列没有名称。

scaled\u features
是一个numpy数组,不能用字符串索引数组。您需要首先使用以下命令将其转换为数据帧:

scaled\u features=pd.DataFrame(scaled\u features)

对于较新版本的,在几个点上,应该是
pd.plotting.parallel_坐标,如果将预测值设置为数据帧,则更容易,例如:

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn import datasets
from sklearn.decomposition import PCA

# import some data to play with
X = iris.data
y = iris.target

scaler = StandardScaler()
scaled_features = pd.DataFrame(scaler.fit_transform(X))
如果可以,请提供列名:

scaled_features.columns = iris.feature_names
Kmeans和分配群集:

kmeans = KMeans(init="random",n_clusters=6,n_init=10,max_iter=300,random_state=42)
kmeans.fit(scaled_features)

scaled_features['cluster'] = kmeans.predict(scaled_features)
绘图:

或者对特征和绘图进行一些降维:

from sklearn.manifold import MDS
import seaborn as sns

embedding = MDS(n_components=2)
mds = pd.DataFrame(embedding.fit_transform(scaled_features.drop('cluster',axis=1)),
             columns = ['component1','component2'])
mds['cluster'] = kmeans.predict(scaled_features.drop('cluster',axis=1))

sns.scatterplot(data=mds,x = "component1",y="component2",hue="cluster")

cluster
中的代码对应于sklearn import cluster中的
否,我认为这不是真的。因为在答案代码中,我们有这样的代码:从sklearn.preprocessing导入StandardScaler scaler=StandardScaler()scaled\u features=scaler.fit\u transform(psnr\u比特率)kmeans=kmeans(init=“random”,n\u clusters=6,n\u init=10,max\u iter=300,random\u state=42)kmeans.fit(scaled\u features)y\u kmeans=kmeans.predict(scaled\u features)scaled\u features['cluster']=y_kmeans pd.tools.plotting.parallel_坐标(缩放特征,'cluster')和cluster用作列I thinkYes,字符串
“cluster”
用作解决方案中数据帧的列名。我仍然不明白你不理解的东西…我使用了上面的代码,使用集群产生了一个错误。请看上面我添加的新代码。太好了。谢谢。这是一种在这些图上用平行坐标显示簇中心的方法吗?你能告诉我在你上面画的图上有没有显示簇中心的方法?因为您使用这个pd.plotting.parallel_坐标(缩放的_特征,'簇')进行打印,我不知道如何在这个图上显示簇中心。我可以稍后再试,现在正忙于工作
pd.plotting.parallel_coordinates(scaled_features, 'cluster')
from sklearn.manifold import MDS
import seaborn as sns

embedding = MDS(n_components=2)
mds = pd.DataFrame(embedding.fit_transform(scaled_features.drop('cluster',axis=1)),
             columns = ['component1','component2'])
mds['cluster'] = kmeans.predict(scaled_features.drop('cluster',axis=1))

sns.scatterplot(data=mds,x = "component1",y="component2",hue="cluster")