Python 使用numpy为簇指定自定义颜色_Python_Numpy_Pandas_Matplotlib

Python 使用numpy为簇指定自定义颜色

python numpy pandas matplotlib

Python 使用numpy为簇指定自定义颜色,python,numpy,pandas,matplotlib,Python,Numpy,Pandas,Matplotlib,对于以下代码绘制的不同簇，是否有一种使用首选颜色（8到10或更多）的方法： import numpy as np existing_df_2d.plot( kind='scatter', x='PC2',y='PC1', c=existing_df_2d.cluster.astype(np.float), figsize=(16,8)) 代码如下所示：谢谢我尝试了以下方法，但没有成功： LABEL_COLOR_MAP = {0 : 'red',

对于以下代码绘制的不同簇，是否有一种使用首选颜色（8到10或更多）的方法：

import numpy as np

existing_df_2d.plot(
    kind='scatter',
    x='PC2',y='PC1',
    c=existing_df_2d.cluster.astype(np.float), 
    figsize=(16,8))

代码如下所示：

谢谢

我尝试了以下方法，但没有成功：

LABEL_COLOR_MAP = {0 : 'red',
               1 : 'blue',
               2 : 'green',
               3 : 'purple'}

label_color = [LABEL_COLOR_MAP[l] for l in range(len(np.unique(existing_df_2d.cluster)))]

existing_df_2d.plot(
    kind='scatter',
    x='PC2',y='PC1',
    c=label_color, 
    figsize=(16,8))

您需要添加一种新颜色

，并按字典使用

标签\u颜色\u地图

：

LABEL_COLOR_MAP = {0 : 'red',
                   1 : 'blue',
                   2 : 'green',
                   3 : 'purple',
                   4 : 'yellow'}

existing_df_2d.plot(
        kind='scatter',
        x='PC2',y='PC1',
        c=existing_df_2d.cluster.map(LABEL_COLOR_MAP), 
        figsize=(16,8))

因为：

print np.unique(existing_df_2d.cluster)
[0 1 2 3 4]

所有代码：

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

tb_existing_url_csv = 'https://docs.google.com/spreadsheets/d/1X5Jp7Q8pTs3KLJ5JBWKhncVACGsg5v4xu6badNs4C7I/pub?gid=0&output=csv'

existing_df = pd.read_csv(
    tb_existing_url_csv, 
    index_col = 0, 
    thousands  = ',')
existing_df.index.names = ['country']
existing_df.columns.names = ['year']

pca = PCA(n_components=2)
pca.fit(existing_df)
PCA(copy=True, n_components=2, whiten=False)
existing_2d = pca.transform(existing_df)

existing_df_2d = pd.DataFrame(existing_2d)
existing_df_2d.index = existing_df.index
existing_df_2d.columns = ['PC1','PC2']
existing_df_2d.head()

kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit(existing_df)
existing_df_2d['cluster'] = pd.Series(clusters.labels_, index=existing_df_2d.index)

测试：

按列排列的前10行

PC2

：

print existing_df_2d.loc[existing_df_2d['PC2'].nlargest(10).index,:]
                          PC1         PC2  cluster
country                                           
Kiribati         -2234.809790  864.494075        2
Djibouti         -3798.447446  578.975277        4
Bhutan           -1742.709249  569.448954        2
Solomon Islands   -809.277671  530.292939        1
Nepal             -986.570652  525.624757        1
Korea, Dem. Rep. -2146.623299  438.945977        2
Timor-Leste      -1618.364795  428.244340        2
Tuvalu           -1075.316806  366.666171        1
Mongolia          -686.839037  363.722971        1
India            -1146.809345  363.270389        1

谢谢Max，但不知怎么的，我还是无法解决这个问题。我在我原来的博文中又添加了几行代码。谢谢@jezrael。但我担心的是，不同的颜色似乎到处都是，没有显示出网站上的独特集群。我认为颜色和簇之间的联系仍然缺失。是的，你是对的。我编辑答案并添加测试部分。

print existing_df_2d.loc[existing_df_2d['PC2'].nlargest(10).index,:]
                          PC1         PC2  cluster
country                                           
Kiribati         -2234.809790  864.494075        2
Djibouti         -3798.447446  578.975277        4
Bhutan           -1742.709249  569.448954        2
Solomon Islands   -809.277671  530.292939        1
Nepal             -986.570652  525.624757        1
Korea, Dem. Rep. -2146.623299  438.945977        2
Timor-Leste      -1618.364795  428.244340        2
Tuvalu           -1075.316806  366.666171        1
Mongolia          -686.839037  363.722971        1
India            -1146.809345  363.270389        1