基于距离/标准的Python元素重组_Python_Algorithm_Loops_Cluster Analysis_Hierarchical Clustering

基于距离/标准的Python元素重组

python algorithm loops

基于距离/标准的Python元素重组,python,algorithm,loops,cluster-analysis,hierarchical-clustering,Python,Algorithm,Loops,Cluster Analysis,Hierarchical Clustering,假设 1。我定义了一个返回距离值的函数foo。 2.我有一个组列表（在我的例子中也是列表），每个组都包含一些变量。 3.不同组中的变量不同，即没有重复项。 4.这些组的长度>=1，（非空），并且我在R中实现了相同的方法，并根据SAS文档使用了以下内容“变量到集群的迭代重新分配分两个阶段进行。第一个阶段是最近组件排序（NCS）相位，原则上类似于Anderberg（1973）描述的最近质心排序算法。在每次迭代中，计算聚类成分，并将每个变量分配给与其具有最高平方相关性的成分。第二阶段涉及一个搜索算法，

假设
1。我定义了一个返回距离值的函数

foo

。
2.我有一个组列表（在我的例子中也是列表），每个组都包含一些变量。
3.不同组中的变量不同，即没有重复项。

4.这些组的长度>=1，（非空），并且我在R中实现了相同的方法，并根据SAS文档使用了以下内容“变量到集群的迭代重新分配分两个阶段进行。第一个阶段是最近组件排序（NCS）相位，原则上类似于Anderberg（1973）描述的最近质心排序算法。在每次迭代中，计算聚类成分，并将每个变量分配给与其具有最高平方相关性的成分。第二阶段涉及一个搜索算法，其中测试每个变量，以查看将其分配给不同的聚类是否会增加解释的方差量。如果重新分配变量在搜索阶段，在测试下一个变量之前，将重新计算所涉及的两个集群的组件。NCS阶段比搜索阶段快得多，但更可能被局部最优捕获。” varclus中的变量重新分配分两步进行

基于平方相关性-从簇（i）到所有其他簇的（平方载荷）变量的平方相关性。变量将被重新分配到与其具有最大平方相关性的集群

变量（i）与簇（j）的平方相关性可定义为： （变量（i）与簇（j）中变量相关的向量）%*%簇（j）的第一主成分/sqrt（簇（j）的第一特征值）^2

此处：%*%矩阵乘法

基于搜索的重新分配-此步骤成本高昂，因为它搜索给定变量是否可以重新分配给其他集群，从而导致总方差增加

是的，几个月前我用python实现了这一点。让我的结果与SAS结果匹配很困难，但python代码的逻辑应该与您提供的SAS文档中描述的逻辑匹配。谢谢：）@asheketchum我能看到python吗？它在GitHub上吗？如果我could@joshlk我还没用过GitHub对不起。你到底在寻找代码的哪一部分？聚类部分？@AsheKetchum复制SAS varclus方法的聚类部分。在python中似乎没有任何东西可以复制它。您可以使用pastbin.com或GitHub gist。谢谢@joshlk我不确定我是否应该分享代码本身。有没有办法直接给你发信息？也许我可以试着解释一下我是怎么做到的，也许你能想出一个更好的版本。

# In my example, my variables
[[1, 2], [5, 8], [3, 4], [7, 9]]
# Suppose that here we define distance between an element and a list 
# by the difference between the element and the list's average
# Suppose we do not reassign if a number is equally as close to it's current
# group and another group
# and assume an empty set is the same as a set that only contains 0
# the first group does not experience change as its elements are closest to itself
[[1, 2], [8], [3, 4, 5], [7, 9]] 
# 5 is moved to the third group since it's closer to 3.5 than it is to 1.5, 8 and 8
# We then consider the first group again to see if they require reassignment 
# after the group changes. Luckily we do not.
# this is an important step as 1 is now equally close to [2] and [0].
[[1, 2], [], [3, 4, 5], [7, 9, 8]] 
# 8 is moved to last group since 8 is closest to 8
# we would now check to see if the reassignment of 8 
# would cause 1, 2, or 5 to be reassigned, the answer is no here.
[[1, 2], [], [3, 4, 5], [7, 9, 8]]
# We will let 3 remain since it is equally as close to (1+2)/2 and (4+5)/2
# 4 and 5 remain in their current group, and then 7,9,8 remain as well.
# The preferred final result would then look something like:
[[1, 2], [3, 4, 5], [7, 9, 8]]
# Since we require all groups be non-empty, we removed the empty list.