基于距离/标准的Python元素重组
假设基于距离/标准的Python元素重组,python,algorithm,loops,cluster-analysis,hierarchical-clustering,Python,Algorithm,Loops,Cluster Analysis,Hierarchical Clustering,假设 1。我定义了一个返回距离值的函数foo。 2.我有一个组列表(在我的例子中也是列表),每个组都包含一些变量。 3.不同组中的变量不同,即没有重复项。 4.这些组的长度>=1,(非空),并且我在R中实现了相同的方法,并根据SAS文档使用了以下内容“变量到集群的迭代重新分配分两个阶段进行。第一个阶段是最近组件排序(NCS)相位,原则上类似于Anderberg(1973)描述的最近质心排序算法。在每次迭代中,计算聚类成分,并将每个变量分配给与其具有最高平方相关性的成分。第二阶段涉及一个搜索算法,
1。我定义了一个返回距离值的函数
foo
。2.我有一个组列表(在我的例子中也是列表),每个组都包含一些变量。
3.不同组中的变量不同,即没有重复项。
4.这些组的长度>=1,(非空),并且我在R中实现了相同的方法,并根据SAS文档使用了以下内容“变量到集群的迭代重新分配分两个阶段进行。第一个阶段是最近组件排序(NCS)相位,原则上类似于Anderberg(1973)描述的最近质心排序算法。在每次迭代中,计算聚类成分,并将每个变量分配给与其具有最高平方相关性的成分。第二阶段涉及一个搜索算法,其中测试每个变量,以查看将其分配给不同的聚类是否会增加解释的方差量。如果重新分配变量在搜索阶段,在测试下一个变量之前,将重新计算所涉及的两个集群的组件。NCS阶段比搜索阶段快得多,但更可能被局部最优捕获。” varclus中的变量重新分配分两步进行
是的,几个月前我用python实现了这一点。让我的结果与SAS结果匹配很困难,但python代码的逻辑应该与您提供的SAS文档中描述的逻辑匹配。谢谢:)@asheketchum我能看到python吗?它在GitHub上吗?如果我could@joshlk我还没用过GitHub对不起。你到底在寻找代码的哪一部分?聚类部分?@AsheKetchum复制SAS varclus方法的聚类部分。在python中似乎没有任何东西可以复制它。您可以使用pastbin.com或GitHub gist。谢谢@joshlk我不确定我是否应该分享代码本身。有没有办法直接给你发信息?也许我可以试着解释一下我是怎么做到的,也许你能想出一个更好的版本。
# In my example, my variables
[[1, 2], [5, 8], [3, 4], [7, 9]]
# Suppose that here we define distance between an element and a list
# by the difference between the element and the list's average
# Suppose we do not reassign if a number is equally as close to it's current
# group and another group
# and assume an empty set is the same as a set that only contains 0
# the first group does not experience change as its elements are closest to itself
[[1, 2], [8], [3, 4, 5], [7, 9]]
# 5 is moved to the third group since it's closer to 3.5 than it is to 1.5, 8 and 8
# We then consider the first group again to see if they require reassignment
# after the group changes. Luckily we do not.
# this is an important step as 1 is now equally close to [2] and [0].
[[1, 2], [], [3, 4, 5], [7, 9, 8]]
# 8 is moved to last group since 8 is closest to 8
# we would now check to see if the reassignment of 8
# would cause 1, 2, or 5 to be reassigned, the answer is no here.
[[1, 2], [], [3, 4, 5], [7, 9, 8]]
# We will let 3 remain since it is equally as close to (1+2)/2 and (4+5)/2
# 4 and 5 remain in their current group, and then 7,9,8 remain as well.
# The preferred final result would then look something like:
[[1, 2], [3, 4, 5], [7, 9, 8]]
# Since we require all groups be non-empty, we removed the empty list.