Matlab 根据唯一值频率对向量进行排序_Matlab_Sorting_Cluster Analysis

Matlab 根据唯一值频率对向量进行排序

matlab sorting

Matlab 根据唯一值频率对向量进行排序,matlab,sorting,cluster-analysis,Matlab,Sorting,Cluster Analysis,我正在使用kmeans对NxM矩阵的行进行聚类 clustIdx = kmeans(data, N_CLUST, 'EmptyAction', 'drop'); 然后，我重新排列矩阵中的行，使相邻行位于同一集群中 dataClustered = data(clustIdx,:); 然而，每次运行聚类分析时，我都会得到或多或少相同的聚类，但具有不同的身份。因此，dataClustered中的结构在每次迭代后看起来都是相同的，但组的顺序不同我想重新安排我的集群标识，这样较低的集群标识表示密集集

我正在使用

kmeans

对

NxM

矩阵的行进行聚类

clustIdx = kmeans(data, N_CLUST, 'EmptyAction', 'drop');

然后，我重新排列矩阵中的行，使相邻行位于同一集群中

dataClustered = data(clustIdx,:);

然而，每次运行聚类分析时，我都会得到或多或少相同的聚类，但具有不同的身份。因此，

dataClustered

中的结构在每次迭代后看起来都是相同的，但组的顺序不同

我想重新安排我的集群标识，这样较低的集群标识表示密集集群，较高的数量表示稀疏集群

有没有一种简单和/或直观的方法可以做到这一点

转化

clustIdx = [1 2 3 2 3 2 4 4 4 4];

到

标识本身是任意的，信息包含在分组中

如果我理解正确，您希望将簇标签1分配给点数最多的簇，将簇标签2分配给点数第二多的簇，以此类推

假设您有一个名为

idx

>> idx = [1 1 2 2 2 2 3 3 3]';

现在您可以像这样重新标记idx：

%# count the number of occurrences
cts = hist(idx,1:max(idx));

%# sort the counts - now we know that 1 should be last
[~,sortIdx] = sort(cts,'descend')
sortIdx =
     2     3     1

%# create a mapping vector (thanks @angainor)
map(sortIdx) = 1:length(sortIdx);
map =
     3     1     2

%# and remap indices
map(idx)
ans =
     3     3     1     1     1     1     2     2     2

它可能效率不高，但简单的方法是首先确定每个集群的密度

然后，您可以创建一个包含

密度

和

聚类DX

之后，一个简单的排序将按正确的顺序为您提供

ClusterIdx

，

+1请给出一个建议。您可以用

map（sortIdx）=1:numel（sortIdx）替换第二个sort
-可能会快一点。这本质上是一种逆排列。
%# count the number of occurrences
cts = hist(idx,1:max(idx));

%# sort the counts - now we know that 1 should be last
[~,sortIdx] = sort(cts,'descend')
sortIdx =
     2     3     1

%# create a mapping vector (thanks @angainor)
map(sortIdx) = 1:length(sortIdx);
map =
     3     1     2

%# and remap indices
map(idx)
ans =
     3     3     1     1     1     1     2     2     2