Performance 使用稀疏矩阵时的最佳实践

Performance 使用稀疏矩阵时的最佳实践,performance,matlab,benchmarking,sparse-matrix,Performance,Matlab,Benchmarking,Sparse Matrix,这个问题是基于一篇文章中的讨论。我之前一直在处理稀疏矩阵,我相信我处理这些矩阵的方法是有效的 我的问题有两个: 在下面的示例中,A=full(S)其中S是一个稀疏矩阵 访问稀疏矩阵中的元素的“正确”方法是什么? 也就是说,稀疏等价于var=A(行,列)是什么 我对这个话题的看法:你不会做任何不同的事情var=S(行,列)是最有效的。我在这方面受到质疑,并解释如下: 按您所说的方式访问第2行和第2列上的元素,S(2,2)确实如此 与添加新元素相同:var=S(2,2)=>a=full(S)=> v

这个问题是基于一篇文章中的讨论。我之前一直在处理稀疏矩阵,我相信我处理这些矩阵的方法是有效的

我的问题有两个:

在下面的示例中,
A=full(S)
其中
S
是一个稀疏矩阵

访问稀疏矩阵中的元素的“正确”方法是什么? 也就是说,稀疏等价于
var=A(行,列)
是什么

我对这个话题的看法:你不会做任何不同的事情
var=S(行,列)
是最有效的。我在这方面受到质疑,并解释如下:

按您所说的方式访问第2行和第2列上的元素,S(2,2)确实如此 与添加新元素相同:
var=S(2,2)
=>
a=full(S)
=>
var=A(2,2)
=>
S=sparse(A)=>4

这句话真的正确吗

向稀疏矩阵添加元素的“正确”方法是什么? 也就是说,
A(row,col)=var的稀疏等价物是什么?(假设
A(行,列)==0开始)

众所周知,对于大型稀疏矩阵,简单地执行
A(row,col)=var
是很慢的。发件人:

如果您想更改此矩阵中的值,您可能会受到诱惑 要使用相同的索引,请执行以下操作:

B(3,1)=42;%这段代码确实有效,但是速度很慢

我对此主题的看法:在处理稀疏矩阵时,通常从向量开始,并使用它们以这种方式创建矩阵:
S=sparse(i,j,S,m,n)
。当然,您也可以这样创建它:
S=sparse(A)
sprand(m,n,density)
或类似的东西

如果您开始第一种方式,您只需执行以下操作:

i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n); 
如果一开始没有矢量,您也会做同样的事情,但首先使用:


现在你当然有了向量,如果你做了几次这个操作,你可以重用它们。但是,最好一次添加所有新元素,而不是在循环中执行上述操作,因为。在这种情况下,
new\u i
new\u j
new\u s
将是对应于新元素的向量。

MATLAB将稀疏矩阵存储在中。这意味着,当您执行(2,2)之类的操作(在第2行第2列中获取元素)时,MATLAB首先访问第二列,然后在第2行中查找元素(每列中的行索引按升序存储)。你可以这样想:

 A2 = A(:,2);
 A2(2)
如果您只访问稀疏矩阵的单个元素,那么执行
var=S(r,c)
就可以了。但是,如果在稀疏矩阵的元素上循环,则可能希望一次访问一列,然后通过
[i,~,x]=find(S(:,c))
在非零行索引上循环。或者使用类似于
spfun
的方法

应该避免先构造密集矩阵
a
,然后再进行
S=sparse(a)
,因为这种操作只会挤出零。相反,正如您所注意到的,使用三元组形式并调用
sparse(i,j,x,m,n)
从头开始构建稀疏矩阵要高效得多。MATLAB有一个描述如何有效构造稀疏矩阵的工具


最初描述稀疏矩阵在MATLAB中的实现是一本很好的读物。它提供了一些关于稀疏矩阵算法最初是如何实现的更多信息。

编辑:答案根据Oleg的建议进行了修改(参见注释)

这是我对你问题第二部分的基准。对于测试直接插入,矩阵初始化为空,并使用不同的
nzmax
。对于从索引向量重建的测试,这是不相关的,因为矩阵是在每次调用时从头开始构建的。对这两种方法进行了测试,分别是执行单个插入操作(不同数量的元素),还是执行增量插入,每次执行一个值(最多相同数量的元素)。由于计算压力,我将每个测试用例的重复次数从1000次降低到100次。我相信这在统计上仍然是可行的

Ssize = 10000;
NumIterations = 100;
NumInsertions = round(logspace(0, 4, 10));
NumInitialNZ = round(logspace(1, 4, 4));

NumTests = numel(NumInsertions) * numel(NumInitialNZ);
TimeDirect = zeros(numel(NumInsertions), numel(NumInitialNZ));
TimeIndices = zeros(numel(NumInsertions), 1);

%% Single insertion operation (non-incremental)
% Method A: Direct insertion
for iInitialNZ = 1:numel(NumInitialNZ)
    disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);

    for iInsertions = 1:numel(NumInsertions)
        tSum = 0;
        for jj = 1:NumIterations
            S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
            r = randi(Ssize, NumInsertions(iInsertions), 1);
            c = randi(Ssize, NumInsertions(iInsertions), 1);

            tic
            S(r,c) = 1;
            tSum = tSum + toc;
        end

        disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
        TimeDirect(iInsertions, iInitialNZ) = tSum;
    end
end

% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
    tSum = 0;
    for jj = 1:NumIterations
        i = []; j = []; s = [];
        r = randi(Ssize, NumInsertions(iInsertions), 1);
        c = randi(Ssize, NumInsertions(iInsertions), 1);
        s_ones = ones(NumInsertions(iInsertions), 1);

        tic
        i_new = [i; r];
        j_new = [j; c];
        s_new = [s; s_ones];
        S = sparse(i_new, j_new ,s_new , Ssize, Ssize);
        tSum = tSum + toc;
    end

    disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
    TimeIndices(iInsertions) = tSum;
end

SingleOperation.TimeDirect = TimeDirect;
SingleOperation.TimeIndices = TimeIndices;

%% Incremental insertion
for iInitialNZ = 1:numel(NumInitialNZ)
    disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);

    % Method A: Direct insertion
    for iInsertions = 1:numel(NumInsertions)
        tSum = 0;
        for jj = 1:NumIterations
            S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
            r = randi(Ssize, NumInsertions(iInsertions), 1);
            c = randi(Ssize, NumInsertions(iInsertions), 1);

            tic
            for ii = 1:NumInsertions(iInsertions)
                S(r(ii),c(ii)) = 1;
            end
            tSum = tSum + toc;
        end

        disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
        TimeDirect(iInsertions, iInitialNZ) = tSum;
    end
end

% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
    tSum = 0;
    for jj = 1:NumIterations
        i = []; j = []; s = [];
        r = randi(Ssize, NumInsertions(iInsertions), 1);
        c = randi(Ssize, NumInsertions(iInsertions), 1);

        tic
        for ii = 1:NumInsertions(iInsertions)
            i = [i; r(ii)];
            j = [j; c(ii)];
            s = [s; 1];
            S = sparse(i, j ,s , Ssize, Ssize);
        end
        tSum = tSum + toc;
    end

    disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
    TimeIndices(iInsertions) = tSum;
end

IncremenalInsertion.TimeDirect = TimeDirect;
IncremenalInsertion.TimeIndices = TimeIndices;

%% Plot results
% Single insertion
figure;
loglog(NumInsertions, SingleOperation.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
    loglog(NumInsertions, SingleOperation.TimeDirect(:, iInitialNZ));
    cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for single insertion operation');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;

% Incremental insertions
figure;
loglog(NumInsertions, IncremenalInsertion.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
    loglog(NumInsertions, IncremenalInsertion.TimeDirect(:, iInitialNZ));
    cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for incremental insertions');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
我在MatlabR2012A中运行了这个。此图总结了执行单个插入操作的结果:

这表明,如果只执行一个操作,那么使用直接插入要比使用索引向量慢得多。在使用索引向量的情况下,增长可能是因为向量本身的增长,也可能是因为更长的稀疏矩阵构造,我不确定是哪个。用于构建矩阵的初始
nzmax
似乎对其增长没有影响

此图总结了增量插入的结果:

在这里,我们看到了相反的趋势:使用索引向量的速度较慢,因为在每一步都会增加索引向量的增量增长和重建稀疏矩阵的开销。理解这一点的一种方法是查看上一个图中的第一点:对于单个元素的插入,使用直接插入比使用索引向量重建更有效。在递增的情况下,这种单一插入是重复进行的,因此,根据MATLAB的建议,使用直接插入而不是索引向量是可行的

这种理解还表明,如果我们一次递增地添加,比如说100个元素,那么有效的选择将是使用索引向量,而不是直接插入,因为第一张图显示这种方法对于这种大小的插入更快。在这两种模式之间,你可能应该进行实验,看看哪种方法更有效,尽管结果可能会表明,这两种方法之间的差异可以忽略不计

底线:我应该使用哪种方法?

我的结论是,这取决于自然
Ssize = 10000;
NumIterations = 100;
NumInsertions = round(logspace(0, 4, 10));
NumInitialNZ = round(logspace(1, 4, 4));

NumTests = numel(NumInsertions) * numel(NumInitialNZ);
TimeDirect = zeros(numel(NumInsertions), numel(NumInitialNZ));
TimeIndices = zeros(numel(NumInsertions), 1);

%% Single insertion operation (non-incremental)
% Method A: Direct insertion
for iInitialNZ = 1:numel(NumInitialNZ)
    disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);

    for iInsertions = 1:numel(NumInsertions)
        tSum = 0;
        for jj = 1:NumIterations
            S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
            r = randi(Ssize, NumInsertions(iInsertions), 1);
            c = randi(Ssize, NumInsertions(iInsertions), 1);

            tic
            S(r,c) = 1;
            tSum = tSum + toc;
        end

        disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
        TimeDirect(iInsertions, iInitialNZ) = tSum;
    end
end

% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
    tSum = 0;
    for jj = 1:NumIterations
        i = []; j = []; s = [];
        r = randi(Ssize, NumInsertions(iInsertions), 1);
        c = randi(Ssize, NumInsertions(iInsertions), 1);
        s_ones = ones(NumInsertions(iInsertions), 1);

        tic
        i_new = [i; r];
        j_new = [j; c];
        s_new = [s; s_ones];
        S = sparse(i_new, j_new ,s_new , Ssize, Ssize);
        tSum = tSum + toc;
    end

    disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
    TimeIndices(iInsertions) = tSum;
end

SingleOperation.TimeDirect = TimeDirect;
SingleOperation.TimeIndices = TimeIndices;

%% Incremental insertion
for iInitialNZ = 1:numel(NumInitialNZ)
    disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);

    % Method A: Direct insertion
    for iInsertions = 1:numel(NumInsertions)
        tSum = 0;
        for jj = 1:NumIterations
            S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
            r = randi(Ssize, NumInsertions(iInsertions), 1);
            c = randi(Ssize, NumInsertions(iInsertions), 1);

            tic
            for ii = 1:NumInsertions(iInsertions)
                S(r(ii),c(ii)) = 1;
            end
            tSum = tSum + toc;
        end

        disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
        TimeDirect(iInsertions, iInitialNZ) = tSum;
    end
end

% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
    tSum = 0;
    for jj = 1:NumIterations
        i = []; j = []; s = [];
        r = randi(Ssize, NumInsertions(iInsertions), 1);
        c = randi(Ssize, NumInsertions(iInsertions), 1);

        tic
        for ii = 1:NumInsertions(iInsertions)
            i = [i; r(ii)];
            j = [j; c(ii)];
            s = [s; 1];
            S = sparse(i, j ,s , Ssize, Ssize);
        end
        tSum = tSum + toc;
    end

    disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
    TimeIndices(iInsertions) = tSum;
end

IncremenalInsertion.TimeDirect = TimeDirect;
IncremenalInsertion.TimeIndices = TimeIndices;

%% Plot results
% Single insertion
figure;
loglog(NumInsertions, SingleOperation.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
    loglog(NumInsertions, SingleOperation.TimeDirect(:, iInitialNZ));
    cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for single insertion operation');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;

% Incremental insertions
figure;
loglog(NumInsertions, IncremenalInsertion.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
    loglog(NumInsertions, IncremenalInsertion.TimeDirect(:, iInitialNZ));
    cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for incremental insertions');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;