Python 使用t-SNE降维执行聚类_Python_Algorithm_Cluster Analysis_Dimensionality Reduction

Python 使用t-SNE降维执行聚类

python algorithm

Python 使用t-SNE降维执行聚类,python,algorithm,cluster-analysis,dimensionality-reduction,Python,Algorithm,Cluster Analysis,Dimensionality Reduction,问题是哪个应该放在第一位：a）聚类还是b）降维算法？换言之，我可以应用伪（因为它不是真正的）降维方法，如t-SNE，然后使用聚类算法来提取聚类，还是应该在原始高维空间上执行聚类，并仅用于为节点着色？下面的代码是开始的好方法还是我完全错了 adjMat = g.get_adjacency(attribute='weight') #get the adjacency matrix from a really large graph adjMat = np.array(adjMat.data) a

问题是哪个应该放在第一位：a）聚类还是b）降维算法？换言之，我可以应用伪（因为它不是真正的）降维方法，如t-SNE，然后使用聚类算法来提取聚类，还是应该在原始高维空间上执行聚类，并仅用于为节点着色？下面的代码是开始的好方法还是我完全错了

adjMat = g.get_adjacency(attribute='weight') #get the adjacency matrix from a really large graph
adjMat = np.array(adjMat.data)
adjMat = adjMat.T #use the incoming interaction vectors 
#initiate the t-SNE algorithm
tsne = manifold.TSNE() #set dimensionality reduction algorithm
manifoldCoords = tsne.fit_transform(adjMat) 
#initiate clustering algorithm
clusteralgorithm = clusterAlgs.KMeans() #set clustering algorithm
linear_clusters = clusteralgorithm.fit_predict(manifoldCoords) #extract clusters

通常先降维，然后聚类。这仅仅是因为高维数据的聚类比较困难，而降维使其更“易于处理”

只要你没有忘记聚类本质上是不可靠的（所以不要相信结果，但要研究它们），你就应该没事。

最好先进行降维，然后再进行聚类

这背后的原因是距离太远。另一个有趣的现象是最近点和最远点之间的比率接近1

我建议你阅读这篇文章，尽管它询问了欧几里德距离，但总的来说你可以找到许多有趣的信息