Python 我试图让kmeans绘制5个集群,但我';m只得到1个集群
我发现了一些代码,似乎工作得很好 下面的代码生成了下面的绘图Python 我试图让kmeans绘制5个集群,但我';m只得到1个集群,python,python-3.x,machine-learning,artificial-intelligence,cluster-analysis,Python,Python 3.x,Machine Learning,Artificial Intelligence,Cluster Analysis,我发现了一些代码,似乎工作得很好 下面的代码生成了下面的绘图 from sklearn import datasets from sklearn import cluster import plotly plotly.offline.init_notebook_mode() iris = datasets.load_iris() kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(iris.data[:,0:1]) data
from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()
iris = datasets.load_iris()
kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(iris.data[:,0:1])
data = [plotly.graph_objs.Scatter(x=iris.data[:,0],
y=iris.data[:,1],
mode='markers',
marker=dict(color=kmeans.labels_)
)]
plotly.offline.iplot(data)
现在,我在代码中做了一个简单的替换,指向我自己的数据,如下所示
from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()
x = df[['Spend']]
y = df[['Revenue']]
kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(x,y)
data = [plotly.graph_objs.Scatter(x=df[['Spend']],
y=df[['Revenue']],
mode='markers',
marker=dict(color=kmeans.labels_))]
plotly.offline.iplot(data)
这给了我这个情节
这是我的数据框
# Import pandas library
import pandas as pd
# initialize list of lists
data = [[110,'CHASE CENTER',53901,8904,44997,4], [541,'METS STADIUM',57999,4921,53078,1], [538,'DEN BRONCOS',91015,9945,81070,1], [640,'LAMBEAU WI',76214,5773,70441,3], [619,'SAL AIRPORT',93000,8278,84722,5]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Location', 'Location_Description', 'Revenue','Spend','Profit_Or_Loss','cluster_number'])
# print dataframe.
df
我一定错过了一些愚蠢的东西,但我看不出它是什么。维度有问题:
# In the iris dataset
>>> iris.data[:,0].shape
(150,)
# Your data
>>> x.shape
(5, 1)
# You need to flatter your array
x.values.flatten().shape
(5,)
例如:
from sklearn import datasets
from sklearn import cluster
import plotly
plotly.offline.init_notebook_mode()
x = df[['Spend']]
y = df[['Revenue']]
x_flat = x.values.flatten()
y_flat = y.values.flatten()
kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(x)
data = [plotly.graph_objs.Scatter(x=x_flat,
y=y_flat,
mode='markers',
marker=dict(color=kmeans.labels_))]
plotly.offline.iplot(data)
另一方面,它接受一个数组(当您经过时不是两个)。您必须将它们转换为某种形状(n_样本,n_特征):
啊。现在它起作用了。我没有告诉你我必须把它弄平。谢谢你指出这一点!!
X = np.zeros((x_flat.shape[0], 2))
X[:, 0] = x_flat
X[:, 1] = y_flat
# X.shape -> (5, 2)
kmeans = cluster.KMeans(n_clusters=5, random_state=42).fit(X)