Performance MATLAB中的游程译码

Performance MATLAB中的游程译码,performance,matlab,run-length-encoding,Performance,Matlab,Run Length Encoding,为了巧妙地使用线性索引或accumarray,我有时觉得有必要根据数据生成序列。由于没有内置的函数,我要求最有效的方法来解码RLE中编码的序列 规范: 为了进行公平比较,我想为该功能制定一些规范: 如果指定了相同长度的可选第二个参数值,则输出应与这些值一致,否则仅为值1:length(runlength) 优雅地处理: 运行长度中的零 值为单元格数组 输出向量的列/行格式应与runlength 简而言之:该函数应等同于以下代码: function V = runLengthDecode

为了巧妙地使用线性索引或
accumarray
,我有时觉得有必要根据数据生成序列。由于没有内置的函数,我要求最有效的方法来解码RLE中编码的序列

规范: 为了进行公平比较,我想为该功能制定一些规范:

  • 如果指定了相同长度的可选第二个参数
    ,则输出应与这些值一致,否则仅为值
    1:length(runlength)
  • 优雅地处理:
    • 运行长度中的零
    • 为单元格数组
  • 输出向量的列/行格式应与
    runlength
简而言之:该函数应等同于以下代码:

function V = runLengthDecode(runLengths, values)
[~,V] = histc(1:sum(runLengths), cumsum([1,runLengths(:).']));
if nargin>1
    V = reshape(values(V), 1, []);
end
V = shiftdim(V, ~isrow(runLengths));
end
示例: 下面是一些测试用例

runLengthDecode([0,1,0,2])
runLengthDecode([0,1,0,4], [1,2,4,5].')
runLengthDecode([0,1,0,2].', [10,20,30,40])
runLengthDecode([0,3,1,0], {'a','b',1,2})
以及它们的产出:

>> runLengthDecode([0,1,0,2])
ans =
     2     4     4

>> runLengthDecode([0,1,0,4], [1,2,4,5].')
ans =    
     2     5     5     5     5

>> runLengthDecode([0,1,0,2].', [10,20,30,40])
ans =
    20
    40
    40

>> runLengthDecode([0,3,1,0],{'a','b',1,2})
ans = 
    'b'    'b'    'b'    [1]
方法1 这应该相当快。它使用 创建大小为
numel(运行长度)
x
numel(运行长度)
的矩阵,因此它可能不适用于较大的输入大小

function V = runLengthDecode(runLengths, values)
nn = 1:numel(runLengths);
if nargin==1 %// handle one-input case
    values = nn;
end
V = values(nonzeros(bsxfun(@times, nn,...
    bsxfun(@le, (1:max(runLengths)).', runLengths(:).'))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
    V = V.';
end
方法2 该方法基于,是对中使用的方法的改编。它使用的内存比方法1少

function V = runLengthDecode2(runLengths, values)
if nargin==1 %// handle one-input case
    values = 1:numel(runLengths);
end
[ii, ~, jj] = find(runLengths(:));
V(cumsum(jj(end:-1:1))) = 1;
V = values(ii(cumsum(V(end:-1:1))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
    V = V.';
end

为了找出哪一个是最有效的解决方案,我们提供了一个测试脚本来评估性能。第一个图描绘了向量
运行长度
增长的运行时,其中条目均匀分布,最大长度为200。A是最快的,Divakar的解决方案排在第二位。

第二个图使用了几乎相同的测试数据,除了它包括初始运行长度
2000
。这主要影响两个
bsxfun
解决方案,而其他解决方案的性能非常相似

测试表明,一个gnovice的将是最有效的


如果您想自己进行速度比较,下面是用于生成上述绘图的代码

function theLastRunLengthDecodingComputationComparisonYoullEverNeed()
Funcs =  {@knedlsepp0, ...
          @LuisMendo1bsxfun, ...
          @LuisMendo2cumsum, ...
          @gnovice3cumsum, ...
          @Divakar4replicate_bsxfunmask, ...
          @knedlsepp5cumsumaccumarray
          };    
%% Growing number of runs, low maximum sizes in runLengths
ns = 2.^(1:25);
paramGenerators{1} = arrayfun(@(n) @(){randi(200,n,1)}, ns,'uni',0);
paramGenerators{2} = arrayfun(@(n) @(){[2000;randi(200,n,1)]}, ns,'uni',0);
for i = 1:2
    times = compareFunctions(Funcs, paramGenerators{i}, 0.5);
    finishedComputations = any(~isnan(times),2);
    h = figure('Visible', 'off');
    loglog(ns(finishedComputations), times(finishedComputations,:));
    legend(cellfun(@func2str,Funcs,'uni',0),'Location','NorthWest','Interpreter','none');
    title('Runtime comparison for run length decoding - Growing number of runs');
    xlabel('length(runLengths)'); ylabel('seconds');
    print(['-f',num2str(h)],'-dpng','-r100',['RunLengthComparsion',num2str(i)]);
end
end

function times = compareFunctions(Funcs, paramGenerators, timeLimitInSeconds)
if nargin<3
    timeLimitInSeconds = Inf;
end
times = zeros(numel(paramGenerators),numel(Funcs));
for i = 1:numel(paramGenerators)
    Params = feval(paramGenerators{i});
    for j = 1:numel(Funcs)
        if max(times(:,j))<timeLimitInSeconds
            times(i,j) = timeit(@()feval(Funcs{j},Params{:}));
        else
            times(i,j) = NaN;
        end
    end
end
end
%% // #################################
%% // HERE COME ALL THE FANCY FUNCTIONS
%% // #################################
function V = knedlsepp0(runLengths, values)
[~,V] = histc(1:sum(runLengths), cumsum([1,runLengths(:).']));%'
if nargin>1
    V = reshape(values(V), 1, []);
end
V = shiftdim(V, ~isrow(runLengths));
end

%% // #################################
function V = LuisMendo1bsxfun(runLengths, values)
nn = 1:numel(runLengths);
if nargin==1 %// handle one-input case
    values = nn;
end
V = values(nonzeros(bsxfun(@times, nn,...
    bsxfun(@le, (1:max(runLengths)).', runLengths(:).'))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
    V = V.'; %'
end
end

%% // #################################
function V = LuisMendo2cumsum(runLengths, values)
if nargin==1 %// handle one-input case
    values = 1:numel(runLengths);
end
[ii, ~, jj] = find(runLengths(:));
V(cumsum(jj(end:-1:1))) = 1;
V = values(ii(cumsum(V(end:-1:1))));
if size(runLengths,1)~=size(values,1) %// adjust orientation of output vector
    V = V.'; %'
end
end

%% // #################################
function V = gnovice3cumsum(runLengths, values)
isColumnVector =  size(runLengths,1)>1;
if nargin==1 %// handle one-input case
    values = 1:numel(runLengths);
end
values = reshape(values(runLengths~=0),1,[]);
if isempty(values) %// If there are no runs
    V = []; return;
end
runLengths = nonzeros(runLengths(:));
index = zeros(1,sum(runLengths));
index(cumsum([1;runLengths(1:end-1)])) = 1;
V = values(cumsum(index));
if isColumnVector %// adjust orientation of output vector
    V = V.'; %'
end
end
%% // #################################
function V = Divakar4replicate_bsxfunmask(runLengths, values)
if nargin==1   %// Handle one-input case
    values = 1:numel(runLengths);
end

%// Do size checking to make sure that both values and runlengths are row vectors.
if size(values,1) > 1
    values = values.'; %//'
end
if size(runLengths,1) > 1
    yes_transpose_output = false;
    runLengths = runLengths.'; %//'
else
    yes_transpose_output = true;
end

maxlen = max(runLengths);

all_values = repmat(values,maxlen,1);
%// OR all_values = values(ones(1,maxlen),:);

V = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'

%// Bring the shape of V back to the shape of runlengths
if yes_transpose_output
    V = V.'; %//'
end
end
%% // #################################
function V = knedlsepp5cumsumaccumarray(runLengths, values)
isRowVector = size(runLengths,2)>1;
%// Actual computation using column vectors
V = cumsum(accumarray(cumsum([1; runLengths(:)]), 1));
V = V(1:end-1);
%// In case of second argument
if nargin>1
    V = reshape(values(V),[],1);
end
%// If original was a row vector, transpose
if isRowVector
    V = V.'; %'
end
end
函数LastUnlthDecodingComputersComparisonyYoulLeverNeeded()
Funcs={@knedlsep0。。。
@Luismendo1BXFun。。。
@LuisMendo2cumsum。。。
@gnovice3cumsum。。。
@Divakar4replicate_bsxfinmask。。。
@Knedlsep5 CumSumachumaray
};    
%%运行次数不断增加,运行长度的最大大小较低
ns=2^(1:25);
参数生成器{1}=arrayfun(@(n)@(){randi(200,n,1)},ns,'uni',0);
参数生成器{2}=arrayfun(@(n)@(){[2000;randi(200,n,1)]},ns,'uni',0);
对于i=1:2
时间=比较函数(Funcs,参数生成器{i},0.5);
finishedComputations=任意(~isnan(次),2);
h=图形(“可见”、“关闭”);
日志(ns(已完成计算),时间(已完成计算,:);
图例(cellfun(@func2str,Funcs,'uni',0),'Location','NorthWest','explorer','none');
标题(“运行时长度解码比较-运行次数增加”);
xlabel(“长度(运行长度)”;ylabel(‘秒’);
打印(['-f',num2str(h)],'-dpng','-r100',['RunLengthComparison',num2str(i)];
结束
结束
函数时间=比较函数(函数、参数生成器、timeLimitInSeconds)
如果是nargin1;
如果nargin==1%//处理一个输入案例
值=1:numel(运行长度);
结束
值=重塑(值(运行长度~=0)、1、[]);
if isempty(值)%//如果没有运行
V=[];返回;
结束
运行长度=非零(运行长度(:);
索引=零(1,和(运行长度));
指数(累积值([1;运行长度(1:end-1)])=1;
V=数值(总和(指数));
如果isColumnVector%//调整输出向量的方向
V=V.;%
结束
结束
%% // #################################
函数V=Divakar4replicate\u bsxfinmask(运行长度、值)
如果nargin==1%//处理一个输入案例
值=1:numel(运行长度);
结束
%//执行大小检查以确保值和运行长度都是行向量。
如果大小(值,1)>1
值=值。“;%/”
结束
如果尺寸(运行长度,1)>1
是\转置\输出=假;
运行长度=运行长度。“;%/”
其他的
是\转置\输出=真;
结束
maxlen=最大(运行长度);
所有_值=repmat(值,最大值,1);
%//或所有_值=值(一(1,maxlen),:);
V=所有_值(bsxfun(@le,(1:maxlen)”,运行长度));%/'
%//将V的形状恢复为运行长度的形状
如果是,则转置输出
V=V.;%/'
结束
结束
%% // #################################
函数V=knedlsepp5cumsumaccumarray(运行长度、值)
isRowVector=大小(运行长度,2)>1;
%//使用列向量的实际计算
V=累积和(accumarray(累积和([1;运行长度(:)])),1));
V=V(1:end-1);
%//如果是第二个论点
如果nargin>1
V=重塑(值(V),[],1);
结束
%//如果原始是行向量,则转置
如果isRowVector
V=V.;%
结束
结束

这里介绍的解决方案基本上分两步执行
运行长度解码-

  • 复制所有
    ,直到最大
    运行长度数
  • 使用的掩蔽功能从每列中选择相应的
    运行长度
  • 函数代码中的其余部分负责输入和输出大小,以满足问题中设置的要求

    下面列出的功能代码将是的“清理”版本。这是密码-

    function V = replicate_bsxfunmask(runLengths, values)
    
    if nargin==1   %// Handle one-input case
        values = 1:numel(runLengths);
    end
    
    %// Do size checking to make sure that both values and runlengths are row vectors.
    if size(values,1) > 1
        values = values.'; %//'
    end
    if size(runLengths,1) > 1
        yes_transpose_output = false;
        runLengths = runLengths.'; %//'
    else
        yes_transpose_output = true;
    end
    
    maxlen = max(runLengths);
    
    all_values = repmat(values,maxlen,1);
    %// OR all_values = values(ones(1,maxlen),:);
    
    V = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'
    
    %// Bring the shape of V back to the shape of runlengths
    if yes_transpose_output
        V = V.'; %//'
    end
    
    return;
    

    下面列出的代码是一个混合代码(
    cumsum
    +
    replicate\u bsxfinmask
    ),当您有大量的异常值或非常大的异常值时,它将非常合适。为了简单起见,目前这只适用于数值数组。下面是实现-

    function out = replicate_bsxfunmask_v2(runLengths, values)
    
    if nargin==1                       %// Handle one-input case
        values = 1:numel(runLengths);
    end
    
    if size(values,1) > 1
        values = values.';  %//'
    end
    
    if size(runLengths,1) > 1
        yes_transpose_output = true;
        runLengths = runLengths.';  %//'
    else
        yes_transpose_output = false;
    end
    
    %// Regularize inputs
    values = values(runLengths>0);
    runLengths = runLengths(runLengths>0);
    
    %// Main portion of code
    thresh = 200; %// runlengths threshold that are to be processed with cumsum
    
    crunLengths = cumsum(runLengths); %%// cumsums of runlengths
    mask = runLengths >= thresh; %// mask of runlengths above threshold
    starts = [1 crunLengths(1:end-1)+1]; %// starts of each group of runlengths
    
    mask_ind = find(mask); %// indices of mask
    
    post_mark = starts(mask);
    negt_mark = crunLengths(mask)+1;
    
    if  ~isempty(negt_mark) && negt_mark(end) > crunLengths(end)
        negt_mark(end) = [];
    end
    
    %// Create array & set starts markers for starts of runlengths above thresh
    marked_out = zeros(1,crunLengths(end));
    marked_out(post_mark) = mask_ind;
    marked_out(negt_mark) = marked_out(negt_mark) -1*mask_ind(1:numel(negt_mark));
    
    %// Setup output array with the cumsumed version of marked array
    out = cumsum(marked_out);
    
    %// Mask for final ouput to decide between large and small runlengths
    thresh_mask = out~=0;
    
    %// Fill output array with cumsum and then rep-bsxfun based approaches
    out(thresh_mask) = values(out(thresh_mask));
    
    values = values(~mask);
    runLengths = runLengths(~mask);
    
    maxlen = max(runLengths);
    all_values = repmat(values,maxlen,1);
    out(~thresh_mask) = all_values(bsxfun(@le,(1:maxlen)',runLengths)); %//'
    
    if yes_transpose_output
        out = out.';  %//'
    end
    
    return;
    

    从R2015a开始,函数
    repelem
    是执行此操作的最佳选择:

    function V = runLengthDecode(runLengths, values)
    if nargin<2
        values = 1:numel(runLengths);
    end
    V = repelem(values, runLengths);
    end
    

    似乎你只需要用
    varargins
    .Hm来修饰已接受的答案。其他两个问题都没有真正阐明什么是最快的解决方案,而且大多数问题都没有正确处理零。所以我并不完全满足于结束这个问题。@LuisMendo:我最近为e
    function V = runLengthDecode(runLengths, values)
    %// Actual computation using column vectors
    V = cumsum(accumarray(cumsum([1; runLengths(:)]), 1));
    V = V(1:end-1);
    %// In case of second argument
    if nargin>1
        V = reshape(values(V),[],1);
    end
    %// If original was a row vector, transpose
    if size(runLengths,2)>1
        V = V.'; %'
    end
    end