Algorithm 矢量化：朋友还是敌人？bsxfun/arrayfun避免循环、重复、置换、挤压等_Algorithm_Matlab_Loops_Vectorization_Bsxfun

Algorithm 矢量化：朋友还是敌人？bsxfun/arrayfun避免循环、重复、置换、挤压等

algorithm matlab loops

Algorithm 矢量化：朋友还是敌人？bsxfun/arrayfun避免循环、重复、置换、挤压等,algorithm,matlab,loops,vectorization,bsxfun,Algorithm,Matlab,Loops,Vectorization,Bsxfun,这个问题与此相关，也可能与此相关假设有两个矩阵A和B。A是M-by-N，B是N-by-K。我想得到一个M-by-K矩阵C，这样C（I，j）=1-prod（1-A（I，：）'.*B（：，j））。我在Matlab中尝试了一些解决方案——我在这里比较它们的计算性能 % Size of matrices: M = 4e3; N = 5e2; K = 5e1; GG = 50; % GG instances rntm1 = zeros(GG, 1); % running time of f

这个问题与此相关，也可能与此相关

假设有两个矩阵A和B。A是M-by-N，B是N-by-K。我想得到一个M-by-K矩阵C，这样

C（I，j）=1-prod（1-A（I，：）'.*B（：，j））

。我在Matlab中尝试了一些解决方案——我在这里比较它们的计算性能

% Size of matrices:
M = 4e3;
N = 5e2;
K = 5e1;

GG = 50;    % GG instances
rntm1 = zeros(GG, 1);    % running time of first algorithm
rntm2 = zeros(GG, 1);    % running time of second algorithm
rntm3 = zeros(GG, 1);    % running time of third algorithm
rntm4 = zeros(GG, 1);    % running time of fourth algorithm
rntm5 = zeros(GG, 1);    % running time of fifth algorithm
for gg = 1:GG

    A = rand(M, N);    % M-by-N matrix of random numbers
    A = A ./ repmat(sum(A, 2), 1, N);    % M-by-N matrix of probabilities (?)
    B = rand(N, K);    % N-by-K matrix of random numbers
    B = B ./ repmat(sum(B), N, 1);    % N-by-K matrix of probabilities (?)

    %% First solution
    % One-liner solution:
    tic
    C = squeeze(1 - prod(1 - repmat(A, [1 1 K]) .* permute(repmat(B, [1 1 M]), [3 1 2]), 2));
    rntm1(gg) = toc;


    %% Second solution
    % Full vectorization, using meshgrid, arrayfun and reshape (from Luis Mendo, second link above)
    tic
    [ii jj] = meshgrid(1:size(A, 1), 1:size(B, 2));
    D = arrayfun(@(n) 1 - prod(1 - A(ii(n), :)' .* B(:, jj(n))), 1:numel(ii));
    D = reshape(D, size(B, 2), size(A, 1)).';
    rntm2(gg) = toc;
    clear ii jj

    %% Third solution
    % Partial vectorization 1
    tic
    E = zeros(M, K);
    for hh = 1:M
      tmp = repmat(A(hh, :)', 1, K);
      E(hh, :) = 1 - prod((1 - tmp .* B), 1);
    end
    rntm3(gg) = toc;
    clear tmp hh

    %% Fourth solution
    % Partial vectorization 2
    tic
    F = zeros(M, K);
    for hh = 1:M
      for ii = 1:K
        F(hh, ii) = 1 - prod(1 - A(hh, :)' .* B(:, ii));
      end
    end
    rntm4(gg) = toc;
    clear hh ii

    %% Fifth solution
    % No vectorization at all
    tic
    G = ones(M, K);
    for hh = 1:M
      for ii = 1:K
        for jj = 1:N
          G(hh, ii) = G(hh, ii) * prod(1 - A(hh, jj) .* B(jj, ii));
        end
        G(hh, ii) = 1 - G(hh, ii);
      end
    end
    rntm5(gg) = toc;
    clear hh ii jj C D E F G

end

prctile([rntm1 rntm2 rntm3 rntm4 rntm5], [2.5 25 50 75 97.5])
%    3.6519    3.5261    0.5912    1.9508    2.7576
%    5.3449    6.8688    1.1973    3.3744    3.9940
%    8.1094    8.7016    1.4116    4.9678    7.0312
%    8.8124   10.5170    1.9874    6.1656    8.8227
%    9.5881   12.0150    2.1529    6.6445    9.5115

mean([rntm1 rntm2 rntm3 rntm4 rntm5])
%    7.2420    8.3068    1.4522    4.5865    6.4423

std([rntm1 rntm2 rntm3 rntm4 rntm5])
%    2.1070    2.5868    0.5261    1.6122    2.4900

解决方案是等效的，但部分矢量化的算法在内存和执行时间方面更有效。即使是三环似乎也比arrayfun表现得更好！有什么方法比第三种，仅部分矢量化的解决方案更好吗

编辑：丹的解决方案是迄今为止最好的。让rntm6、rntm7和rntm8作为其第一个、第二个和第三个解决方案的运行时。然后：

prctile(rntm6, [2.5 25 50 75 97.5])
%    0.6337    0.6377    0.6480    0.7110    1.2932
mean(rntm6)
%    0.7440
std(rntm6)
%    0.1970

prctile(rntm7, [2.5 25 50 75 97.5])
%    0.6898    0.7130    0.9050    1.1505    1.4041
mean(rntm7)
%    0.9313
std(rntm7)
%    0.2276

prctile(rntm8, [2.5 25 50 75 97.5])
%    0.5949    0.6005    0.6036    0.6370    1.3529
mean(rntm8)
%    0.6753
std(rntm8)
%    0.1890

使用

bsxfun

，您可以获得较小的性能提升：

E = zeros(M, K);
for hh = 1:M
  E(hh, :) = 1 - prod((1 - bsxfun(@times, A(hh,:)', B)), 1);
end

您可以通过以下方式压缩（双关语）一点点性能：

E = squeeze(1 - prod((1-bsxfun(@times, permute(B, [3 1 2]), A)),2));

或者你可以尝试为我的第一个建议预先计算转置：

E = zeros(M, K);
At = A';
for hh = 1:M
  E(hh, :) = 1 - prod((1 - bsxfun(@times, At(:,hh), B)), 1);
end

使用

arrayfun

或

bsxfun

绝对有益的一种情况是，您可以使用并行计算工具箱和兼容的NVIDIA GPU。在这种情况下，这两个函数的性能非常快，因为主体可以发送到GPU在那里执行。请参阅示例：

A注意：我会厌倦调用

arrayfun

一种完全矢量化的方法，在内部它也只是循环。现在Matlab中的循环实际上相当有效。通常arrayfun只会增加额外费用——事实上并不是很小——请参见上面的编辑：平均值减半，标准偏差减半以上！比第一个稍微慢一点，但仍然比其他的（基于50个实例）快。哦，在我的测试中它更快。好的，我将再添加一个小的调整，您可以试着注意，当不重新排序非单例维度时，简单的

重塑

通常比

shiftim

或

permute

更快。否则，我认为这是正确的解决方案。@randomatlabuser我会猜像changinge

permute（B[3 1 2]）

到

[x，y]=size（B）；重塑（B[1，x，y]）