Python 我试图让kmeans绘制5个集群，但我'；m只得到1个集群_Python_Python 3.x_Machine Learning_Artificial Intelligence_Cluster Analysis

Python 我试图让kmeans绘制5个集群，但我'；m只得到1个集群

python python-3.x machine-learning artificial-intelligence

Python 我试图让kmeans绘制5个集群，但我'；m只得到1个集群,python,python-3.x,machine-learning,artificial-intelligence,cluster-analysis,Python,Python 3.x,Machine Learning,Artificial Intelligence,Cluster Analysis,我发现了一些代码，似乎工作得很好下面的代码生成了下面的绘图 from sklearn import datasets from sklearn import cluster import plotly plotly.offline.init_notebook_mode() iris = datasets.load_iris() kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(iris.data[:,0:1]) data

我发现了一些代码，似乎工作得很好

下面的代码生成了下面的绘图

from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()


iris = datasets.load_iris()

kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(iris.data[:,0:1])
data = [plotly.graph_objs.Scatter(x=iris.data[:,0], 
                                  y=iris.data[:,1], 
                                  mode='markers',     
                                  marker=dict(color=kmeans.labels_)
                                  )]
plotly.offline.iplot(data)

现在，我在代码中做了一个简单的替换，指向我自己的数据，如下所示

from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()

x = df[['Spend']]
y = df[['Revenue']]

kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(x,y)
data = [plotly.graph_objs.Scatter(x=df[['Spend']], 
                                  y=df[['Revenue']], 
                                  mode='markers',     
                                  marker=dict(color=kmeans.labels_))]
plotly.offline.iplot(data)

这给了我这个情节

这是我的数据框

# Import pandas library
import pandas as pd
  
# initialize list of lists
data = [[110,'CHASE CENTER',53901,8904,44997,4], [541,'METS STADIUM',57999,4921,53078,1], [538,'DEN BRONCOS',91015,9945,81070,1], [640,'LAMBEAU WI',76214,5773,70441,3], [619,'SAL AIRPORT',93000,8278,84722,5]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Location', 'Location_Description', 'Revenue','Spend','Profit_Or_Loss','cluster_number'])
  
# print dataframe.
df

我一定错过了一些愚蠢的东西，但我看不出它是什么。

维度有问题：

# In the iris dataset
>>> iris.data[:,0].shape
(150,)
# Your data
>>> x.shape
(5, 1)

# You need to flatter your array
x.values.flatten().shape
(5,)

例如：

from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()

x = df[['Spend']]
y = df[['Revenue']]

x_flat = x.values.flatten()
y_flat = y.values.flatten()

kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(x)
data = [plotly.graph_objs.Scatter(x=x_flat, 
                                  y=y_flat, 
                                  mode='markers',     
                                  marker=dict(color=kmeans.labels_))]
plotly.offline.iplot(data)

另一方面，它接受一个数组（当您经过时不是两个）。您必须将它们转换为某种形状（n_样本，n_特征）：

啊。现在它起作用了。我没有告诉你我必须把它弄平。谢谢你指出这一点！！

X = np.zeros((x_flat.shape[0], 2))
X[:, 0] = x_flat
X[:, 1] = y_flat
# X.shape -> (5, 2)

kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(X)