Algorithm 矢量化:朋友还是敌人?bsxfun/arrayfun避免循环、重复、置换、挤压等
这个问题与此相关,也可能与此相关 假设有两个矩阵A和B。A是M-by-N,B是N-by-K。我想得到一个M-by-K矩阵C,这样Algorithm 矢量化:朋友还是敌人?bsxfun/arrayfun避免循环、重复、置换、挤压等,algorithm,matlab,loops,vectorization,bsxfun,Algorithm,Matlab,Loops,Vectorization,Bsxfun,这个问题与此相关,也可能与此相关 假设有两个矩阵A和B。A是M-by-N,B是N-by-K。我想得到一个M-by-K矩阵C,这样C(I,j)=1-prod(1-A(I,:)'.*B(:,j))。我在Matlab中尝试了一些解决方案——我在这里比较它们的计算性能 % Size of matrices: M = 4e3; N = 5e2; K = 5e1; GG = 50; % GG instances rntm1 = zeros(GG, 1); % running time of f
C(I,j)=1-prod(1-A(I,:)'.*B(:,j))
。我在Matlab中尝试了一些解决方案——我在这里比较它们的计算性能
% Size of matrices:
M = 4e3;
N = 5e2;
K = 5e1;
GG = 50; % GG instances
rntm1 = zeros(GG, 1); % running time of first algorithm
rntm2 = zeros(GG, 1); % running time of second algorithm
rntm3 = zeros(GG, 1); % running time of third algorithm
rntm4 = zeros(GG, 1); % running time of fourth algorithm
rntm5 = zeros(GG, 1); % running time of fifth algorithm
for gg = 1:GG
A = rand(M, N); % M-by-N matrix of random numbers
A = A ./ repmat(sum(A, 2), 1, N); % M-by-N matrix of probabilities (?)
B = rand(N, K); % N-by-K matrix of random numbers
B = B ./ repmat(sum(B), N, 1); % N-by-K matrix of probabilities (?)
%% First solution
% One-liner solution:
tic
C = squeeze(1 - prod(1 - repmat(A, [1 1 K]) .* permute(repmat(B, [1 1 M]), [3 1 2]), 2));
rntm1(gg) = toc;
%% Second solution
% Full vectorization, using meshgrid, arrayfun and reshape (from Luis Mendo, second link above)
tic
[ii jj] = meshgrid(1:size(A, 1), 1:size(B, 2));
D = arrayfun(@(n) 1 - prod(1 - A(ii(n), :)' .* B(:, jj(n))), 1:numel(ii));
D = reshape(D, size(B, 2), size(A, 1)).';
rntm2(gg) = toc;
clear ii jj
%% Third solution
% Partial vectorization 1
tic
E = zeros(M, K);
for hh = 1:M
tmp = repmat(A(hh, :)', 1, K);
E(hh, :) = 1 - prod((1 - tmp .* B), 1);
end
rntm3(gg) = toc;
clear tmp hh
%% Fourth solution
% Partial vectorization 2
tic
F = zeros(M, K);
for hh = 1:M
for ii = 1:K
F(hh, ii) = 1 - prod(1 - A(hh, :)' .* B(:, ii));
end
end
rntm4(gg) = toc;
clear hh ii
%% Fifth solution
% No vectorization at all
tic
G = ones(M, K);
for hh = 1:M
for ii = 1:K
for jj = 1:N
G(hh, ii) = G(hh, ii) * prod(1 - A(hh, jj) .* B(jj, ii));
end
G(hh, ii) = 1 - G(hh, ii);
end
end
rntm5(gg) = toc;
clear hh ii jj C D E F G
end
prctile([rntm1 rntm2 rntm3 rntm4 rntm5], [2.5 25 50 75 97.5])
% 3.6519 3.5261 0.5912 1.9508 2.7576
% 5.3449 6.8688 1.1973 3.3744 3.9940
% 8.1094 8.7016 1.4116 4.9678 7.0312
% 8.8124 10.5170 1.9874 6.1656 8.8227
% 9.5881 12.0150 2.1529 6.6445 9.5115
mean([rntm1 rntm2 rntm3 rntm4 rntm5])
% 7.2420 8.3068 1.4522 4.5865 6.4423
std([rntm1 rntm2 rntm3 rntm4 rntm5])
% 2.1070 2.5868 0.5261 1.6122 2.4900
解决方案是等效的,但部分矢量化的算法在内存和执行时间方面更有效。即使是三环似乎也比arrayfun表现得更好!有什么方法比第三种,仅部分矢量化的解决方案更好吗
编辑:丹的解决方案是迄今为止最好的。让rntm6、rntm7和rntm8作为其第一个、第二个和第三个解决方案的运行时。然后:
prctile(rntm6, [2.5 25 50 75 97.5])
% 0.6337 0.6377 0.6480 0.7110 1.2932
mean(rntm6)
% 0.7440
std(rntm6)
% 0.1970
prctile(rntm7, [2.5 25 50 75 97.5])
% 0.6898 0.7130 0.9050 1.1505 1.4041
mean(rntm7)
% 0.9313
std(rntm7)
% 0.2276
prctile(rntm8, [2.5 25 50 75 97.5])
% 0.5949 0.6005 0.6036 0.6370 1.3529
mean(rntm8)
% 0.6753
std(rntm8)
% 0.1890
使用
bsxfun
,您可以获得较小的性能提升:
E = zeros(M, K);
for hh = 1:M
E(hh, :) = 1 - prod((1 - bsxfun(@times, A(hh,:)', B)), 1);
end
您可以通过以下方式压缩(双关语)一点点性能:
E = squeeze(1 - prod((1-bsxfun(@times, permute(B, [3 1 2]), A)),2));
或者你可以尝试为我的第一个建议预先计算转置:
E = zeros(M, K);
At = A';
for hh = 1:M
E(hh, :) = 1 - prod((1 - bsxfun(@times, At(:,hh), B)), 1);
end
使用
arrayfun
或bsxfun
绝对有益的一种情况是,您可以使用并行计算工具箱和兼容的NVIDIA GPU。在这种情况下,这两个函数的性能非常快,因为主体可以发送到GPU在那里执行。请参阅示例:A注意:我会厌倦调用arrayfun
一种完全矢量化的方法,在内部它也只是循环。现在Matlab中的循环实际上相当有效。通常arrayfun只会增加额外费用——事实上并不是很小——请参见上面的编辑:平均值减半,标准偏差减半以上!比第一个稍微慢一点,但仍然比其他的(基于50个实例)快。哦,在我的测试中它更快。好的,我将再添加一个小的调整,您可以试着注意,当不重新排序非单例维度时,简单的重塑
通常比shiftim
或permute
更快。否则,我认为这是正确的解决方案。@randomatlabuser我会猜像changingepermute(B[3 1 2])
到[x,y]=size(B);重塑(B[1,x,y])