矩阵中的索引副本：Matlab_Matlab_Indexing

矩阵中的索引副本：Matlab

matlab indexing

矩阵中的索引副本：Matlab,matlab,indexing,Matlab,Indexing,考虑一个矩阵 X = [ 1 2 0 1; 1 0 1 2; 1 2 3 4; 2 4 6 8; . . 1 2 0 1

考虑一个矩阵

 X = [ 1 2 0 1; 
       1 0 1 2;                                          
       1 2 3 4;                                     
       2 4 6 8;
          .           
          .                          
       1 2 0 1                  
          .                 
          .    ]

我想创建一个新列，这样我就可以对每一行出现的第I行进行编号

答复：

有什么想法吗

包含for循环的解决方案可以很容易地完成，也许已经足够快了。我相信有一个更快的解决方案，它可能会使用

cumsum

，但您甚至不需要它。基本思想：首先找到唯一行的索引，以便能够处理标量索引而不是整行（向量）。然后循环索引并查找以前发生的次数：

X = [ 1 2 0 1; 
   1 0 1 2;                                          
   1 2 3 4;                                     
   2 4 6 8;                        
   1 2 0 1;                 
   1 3 3 7;                 
   1 2 0 1];

[~,~,idx] = unique(X, 'rows'); %// find unique rows

%// loop over indices and accumulate number of previous occurences
y = zeros(size(idx));
for i = 1:length(idx)
   y(i) = sum(idx(1:i) == idx(i)); %// this line probably scales horrible with length of idx.
end

示例的结果是：

方法#1

%// unique rows
unqrows = unique(X,'rows'); 

%// matches for each row against the unique rows and their cumsum values
matches_perunqrow = squeeze(all(bsxfun(@eq,X,permute(unqrows,[3 2 1])),2));
cumsum_unqrows = cumsum(matches_perunqrow,1);

%// Go through a row-order and get the cumsum values for the final output
[row,col] = find(matches_perunqrow);
[sorted_row,ind] = sort(row);
y=cumsum_unqrows(sub2ind(size(cumsum_unqrows),[1:size(cumsum_unqrows,1)]',col(ind)));

样本运行-

X =
     1     2     0     1
     1     0     1     2
     1     2     3     4
     2     4     6     8
     1     2     0     1
     1     2     3     4
     1     2     3     4
     1     2     3     4
     1     2     3     4
     1     2     0     1
out =
     1
     1
     1
     1
     2
     2
     3
     4
     5
     3

方法#2

%// unique rows
unqrows = unique(X,'rows');

%// matches for each row against the unique rows
matches_perunqrow = all(bsxfun(@eq,X,permute(unqrows,[3 2 1])),2)

%// Get the cumsum of matches and select only the matches for each row.
%// Since we need to go through a row-order, transpose the result
cumsum_perrow = squeeze(cumsum(matches_perunqrow,1).*matches_perunqrow)' %//'

%// Select the non zero values for the final output
y = cumsum_perrow(cumsum_perrow~=0)

方法#3

%// label each row based on their uniqueness
[~,~,v3] = unique(X,'rows')
matches_perunqrow = bsxfun(@eq,v3,1:size(X,1))

cumsum_unqrows = cumsum(matches_perunqrow,1);

%// Go through a row-order and get the cumsum values for the final output
[row,col] = find(matches_perunqrow);
[sorted_row,ind] = sort(row);
y=cumsum_unqrows(sub2ind(size(cumsum_unqrows),[1:size(cumsum_unqrows,1)]',col(ind)));

方法#4

%// label each row based on their uniqueness
[~,~,match_row_id] = unique(X,'rows');

%// matches for each row against the unique rows and their cumsum values
matches_perunqrow = bsxfun(@eq,match_row_id',[1:size(X,1)]');
cumsum_unqrows = cumsum(matches_perunqrow,2);

%// Select the cumsum values for the ouput based on the unique matches for each row
y = cumsum_unqrows(matches_perunqrow);

这个怎么样

y = sum(triu(squareform(pdist(X))==0)).';

这是通过计算每行前面的行数来实现的。如果两行的距离（用and计算）为0，则两行相等。确保只考虑前面的行

为了减少计算时间并避免依赖统计工具箱，您可以使用@user1735003的建议：

y = sum(triu((bsxfun(@plus, sum(X.^2,2), sum(X.^2,2)') - 2*X*X.')==0));

介意补充一些解释吗？它不太容易阅读。另外，与我的循环版本相比，您的示例运行的一些快速测试没有显示速度提高。20000次重复，我的版本用了3.3秒，你们的两个版本用了4.7秒和3.6秒。也许它适用于其他示例运行。我通常喜欢你的解决方案，但这一次我既看不到可读性的提高，也看不到速度的提高。。。然而，@Nras我在实际发言时添加评论！：）@Nras编辑有评论。不过，可能会有一些优化，以删除

压缩

和

转置

。我也会在我这边检查速度。好的，我很确定如果一个人不是太懒，不想计算尺寸，那么总是可以直接调用

restrape（）

来替换

squeak（）

。如果你在做速度测试，我很高兴，把它留给你。前几天在我的一个回答中，你做得非常出色：-）。非常彻底！谢谢。我可以建议

sum（triu（（bsxfun（@plus，sum（X.^2,2），sum（X.^2,2））-2*X*X'）==0））

这不需要统计工具箱的

pdist

@user1735003认为它的替代方案是-

sum（sqrt（sum（bsxfun（@减号，X，permute（X，[3,2 1]））））==0））

来减少一个

sum

。。。。或者，可能更快：

sum（triu（sqrt（sum（bsxfun（@ne，X，permute（X，[3,2,1]）），2））==0））

@Divakar@LuisMendo聪明的举动！是的，这对这个特殊情况也应该有效。你觉得应该加上这个吗+1@Divaruser1735003的方法似乎更快。我想知道距离计算是否可以改进，在大多数情况下，这似乎确实很快。我只是更相信矢量化，我想这是个人的选择，因为我认为GPU有很好的机会使用矢量化代码。所以，作为一个特例，当我使用

X=randi（1040002000）

和gpuArray for

match_row_id

在我的解决方案中，运行时实际上与循环代码相当，我有一个像样的GPU进行测试+1@Divakar我也喜欢矢量化代码的挑战。您是否尝试使用显式for循环重写

bsxfun

部分，然后应用其余方法？虽然

bsxfun

提供了简洁的代码，但它可能比循环版本慢。就速度优化而言，这很可能是一个问题。编写for循环来替换

bsxfun

部分不会产生任何改进。我认为作为一个循环代码，你的代码是完美的。

y = sum(triu((bsxfun(@plus, sum(X.^2,2), sum(X.^2,2)') - 2*X*X.')==0));