Python 问题w/PyCluster_Python_Cluster Analysis

Python 问题w/PyCluster

python

Python 问题w/PyCluster,python,cluster-analysis,Python,Cluster Analysis,我有以下python代码： from Pycluster import * from numpy import * import matplotlib.pyplot as plt names = [ "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"] distances = array([ [0.

我有以下python代码：

  from Pycluster import *
  from numpy import *
  import matplotlib.pyplot as plt

   names = [ "A1", "A2", "A3", "A4", "A5", "A6", "A7", 
             "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"]

   distances = array([
   [0.000, 0.840, 0.860, 0.115, 0.150, 0.055, 0.000, 0.070, 0.065, 0.000, 0.165, 0.000, 0.000, 0.000, 0.065],
   [0.840, 0.000, 0.710, 0.060, 0.125, 0.060, 0.000, 0.070, 0.065, 0.000, 0.165, 0.000, 0.000, 0.000, 0.070],
   [0.860, 0.710, 0.000, 0.055, 0.120, 0.055, 0.000, 0.070, 0.065, 0.000, 0.000, 0.000, 0.000, 0.000, 0.065],
   [0.115, 0.060, 0.055, 0.000, 0.885, 0.455, 0.415, 0.060, 0.150, 0.050, 0.240, 0.000, 0.000, 0.065, 0.140],
   [0.150, 0.125, 0.120, 0.885, 0.000, 0.510, 0.330, 0.125, 0.165, 0.050, 0.145, 0.000, 0.000, 0.000, 0.200],
   [0.055, 0.060, 0.055, 0.455, 0.510, 0.000, 0.335, 0.060, 0.215, 0.050, 0.140, 0.000, 0.000, 0.000, 0.085],
   [0.000, 0.000, 0.000, 0.415, 0.330, 0.335, 0.000, 0.000, 0.245, 0.060, 0.255, 0.125, 0.000, 0.075, 0.225],
   [0.070, 0.070, 0.070, 0.060, 0.125, 0.060, 0.000, 0.000, 0.195, 0.000, 0.000, 0.000, 0.000, 0.000, 0.140],
   [0.065, 0.065, 0.065, 0.150, 0.165, 0.215, 0.245, 0.195, 0.000, 0.045, 0.135, 0.000, 0.000, 0.000, 0.155],
   [0.000, 0.000, 0.000, 0.050, 0.050, 0.050, 0.060, 0.000, 0.045, 0.000, 0.000, 0.120, 0.000, 0.045, 0.080],
   [0.165, 0.165, 0.000, 0.240, 0.145, 0.140, 0.255, 0.000, 0.135, 0.000, 0.000, 0.000, 0.000, 0.150, 0.150],
   [0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.125, 0.000, 0.000, 0.120, 0.000, 0.000, 0.175, 0.090, 0.105],
   [0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.175, 0.000, 0.000, 0.000],
   [0.000, 0.000, 0.000, 0.065, 0.000, 0.000, 0.075, 0.000, 0.000, 0.045, 0.150, 0.090, 0.000, 0.000, 0.000],
   [0.065, 0.070, 0.065, 0.140, 0.200, 0.085, 0.225, 0.140, 0.155, 0.080, 0.150, 0.105, 0.000, 0.000, 0.000]
   ])

   clusterids, error, nfound = kmedoids(distances, 6)
   print "Cluster ids:", clusterids
   print "error:", error
   print "nfound:", nfound

   cities_in_cluster = {}
   for name, clusterid in zip(names, clusterids):
        cities_in_cluster.setdefault(clusterid, []).append(name)

   import textwrap
   for centroid_id, city_names in cities_in_cluster.items():
        print "Cluster around", names[centroid_id]
        text = ", ".join(city_names)
        for line in textwrap.wrap(text, 70):
             print "  ", line

   colors = ['red', 'green', 'blue', 'yellow', 'white', 'black']

   medoids = {}  
   for i in clusterids:
        medoids[i]= medoids.get(i,0) + 1    

   plt.scatter(distances[:,0],distances[:,1], c=colors)
   plt.show()

此代码存在两个问题：
-每次执行都会产生不同的聚类结果。是这样吗？
-图表仅绘制了11个点，而不是15个点

错误在哪里

谢谢。

对于问题的第二部分，如果只取距离第二维度的前两个值，则只有11个唯一点。i、 e

[[ 0.     0.84 ]
 [ 0.84   0.   ]
 [ 0.86   0.71 ]
 [ 0.115  0.06 ]
 [ 0.15   0.125]
 [ 0.055  0.06 ]
 [ 0.     0.   ]
 [ 0.07   0.07 ]
 [ 0.065  0.065]
 [ 0.     0.   ] # duplicate
 [ 0.165  0.165]
 [ 0.     0.   ] # duplicate
 [ 0.     0.   ] # duplicate
 [ 0.     0.   ] # duplicate
 [ 0.065  0.07 ]]

我不确定这是否有助于解决您的第一个问题，但可能这表明距离的内容的形式与您期望的不同？

kmedoids

使用随机初始化，可能会收敛到局部极小值

因此，如果您多次运行它，您可以得到不同的结果

你的距离矩阵有没有可能不是距离

那里的0值过多

争吵

[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.175, 0.000, 0.000, 0.000]

这是一个极端的例子。通过查看矩阵，所有点基本上是相同的，因为从任何一个点可以找到到任何其他点的距离为0的链！因此，您的矩阵不是距离矩阵。这种违反基本距离属性的行为可能会杀死
kmedoids
，并导致它返回基本上随机的结果

另外，不要散点绘制距离矩阵。散点图用于输入数据，而不是距离矩阵的前两行。如果要从距离矩阵重建散点图，请使用多维缩放。
好的，矩阵作为距离是有意义的（尽管可能零被解释为“未连接”），在这种情况下，所有点都是离散的。没有意义的是在散点图中使用An到a1的距离作为x坐标，使用An到A2的距离作为y坐标！如果0表示“未连接”，那么它不是距离。距离0表示相同。但你是对的，策划逻辑是有缺陷的。但无论如何，它不是一个合适的距离矩阵。0太多。是的，文档在矩阵的形式上相当清晰。尽管不知道如果你使用
inf会发生什么，但这是相似性矩阵，@anonymousse。0表示完全不同，1表示完全复制。我的目的是比较源代码之间的相似性（剽窃）。kmedoids期望的是距离，而不是相似性。0与KMEDOID相同！