MATLAB中ismember()函数的快速版本

MATLAB中ismember()函数的快速版本,matlab,performance,Matlab,Performance,我的问题是找到一种替代方法,以更快的方式实现ismember()在MATLAB中的功能 我的问题是: M [92786253*1] (a: roughly 100M rows) x [749*1] (b: # of rows can vary from 100 to 10K) 我想在b中找到在a中共存的行(a的行索引)对于不同版本的b,此操作需要重复大约10万次 正常方法: tic ind1 = ismember(M,x); toc Elapsed time is 0.

我的问题是找到一种替代方法,以更快的方式实现
ismember()
在MATLAB中的功能

我的问题是:

M [92786253*1]  (a: roughly 100M rows)
x [749*1]       (b: # of rows can vary from 100 to 10K)
我想在
b
中找到在
a
中共存的行(a的行索引)对于不同版本的
b
,此操作需要重复大约10万次

正常方法:

 tic
 ind1 = ismember(M,x);
 toc

 Elapsed time is 0.515627 seconds.
 tic
 n = 1;
 ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3));
 toc

 Error using bsxfun
 Requested 92786253x1x749 (64.7GB) array exceeds maximum array size preference. 
 Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive.
 See array size limit or preference panel for more information.

 Error in demo_ismember_fast (line 23)
 ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3))
s=sort(M);
edge = [-Inf s(2:end) Inf];
v = [1:numel(M) numel(M)];
ind = false(size(M));
%for ... 100M iterations
    tic
    bin = interp1(edge,v,x,'previous');
    ind(bin)= ind(bin)==x;
    toc
    %...
    ind(bin) = false;%at the end of each loop set all elements of ind to 0;
%end
s=sort(M);
edge= [-Inf s(2:end) Inf];
ind = false(size(M));
%for ... 100M iterations
    tic
    [~,~,bin]=histcounts(x,edge);
    ind(bin)= ind(bin)==x;
    toc
    %...
    ind(bin) = false;
%end
快速方法:

 tic
 ind1 = ismember(M,x);
 toc

 Elapsed time is 0.515627 seconds.
 tic
 n = 1;
 ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3));
 toc

 Error using bsxfun
 Requested 92786253x1x749 (64.7GB) array exceeds maximum array size preference. 
 Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive.
 See array size limit or preference panel for more information.

 Error in demo_ismember_fast (line 23)
 ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3))
s=sort(M);
edge = [-Inf s(2:end) Inf];
v = [1:numel(M) numel(M)];
ind = false(size(M));
%for ... 100M iterations
    tic
    bin = interp1(edge,v,x,'previous');
    ind(bin)= ind(bin)==x;
    toc
    %...
    ind(bin) = false;%at the end of each loop set all elements of ind to 0;
%end
s=sort(M);
edge= [-Inf s(2:end) Inf];
ind = false(size(M));
%for ... 100M iterations
    tic
    [~,~,bin]=histcounts(x,edge);
    ind(bin)= ind(bin)==x;
    toc
    %...
    ind(bin) = false;
%end

第二种方法通常比普通方法快15-20倍,但是在这种情况下,我不能用它来限制内存。有什么建议可以加快这项行动吗?谢谢与我分享专家意见

如果您可以使用排序的
a
,这里有两种替代方法。在开始100M迭代之前,初始化一些必需的输入变量和输出变量
ind
,并在每次迭代时修改
ind
,最后将其所有元素设置为
false

interp1:

 tic
 ind1 = ismember(M,x);
 toc

 Elapsed time is 0.515627 seconds.
 tic
 n = 1;
 ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3));
 toc

 Error using bsxfun
 Requested 92786253x1x749 (64.7GB) array exceeds maximum array size preference. 
 Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive.
 See array size limit or preference panel for more information.

 Error in demo_ismember_fast (line 23)
 ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3))
s=sort(M);
edge = [-Inf s(2:end) Inf];
v = [1:numel(M) numel(M)];
ind = false(size(M));
%for ... 100M iterations
    tic
    bin = interp1(edge,v,x,'previous');
    ind(bin)= ind(bin)==x;
    toc
    %...
    ind(bin) = false;%at the end of each loop set all elements of ind to 0;
%end
s=sort(M);
edge= [-Inf s(2:end) Inf];
ind = false(size(M));
%for ... 100M iterations
    tic
    [~,~,bin]=histcounts(x,edge);
    ind(bin)= ind(bin)==x;
    toc
    %...
    ind(bin) = false;
%end
histcounts:

 tic
 ind1 = ismember(M,x);
 toc

 Elapsed time is 0.515627 seconds.
 tic
 n = 1;
 ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3));
 toc

 Error using bsxfun
 Requested 92786253x1x749 (64.7GB) array exceeds maximum array size preference. 
 Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive.
 See array size limit or preference panel for more information.

 Error in demo_ismember_fast (line 23)
 ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3))
s=sort(M);
edge = [-Inf s(2:end) Inf];
v = [1:numel(M) numel(M)];
ind = false(size(M));
%for ... 100M iterations
    tic
    bin = interp1(edge,v,x,'previous');
    ind(bin)= ind(bin)==x;
    toc
    %...
    ind(bin) = false;%at the end of each loop set all elements of ind to 0;
%end
s=sort(M);
edge= [-Inf s(2:end) Inf];
ind = false(size(M));
%for ... 100M iterations
    tic
    [~,~,bin]=histcounts(x,edge);
    ind(bin)= ind(bin)==x;
    toc
    %...
    ind(bin) = false;
%end

您可能会发现内部(内置)
ismembc
函数很有用-它可以比
ismember
快一个数量级:


请注意,
ismembc
仅适用于已排序的非稀疏非Nan数字数据。

我想买64Gb的RAM吧?:这是一个非常大的问题,你需要期待它会很慢。那么,为什么在第一种情况下,我没有收到任何错误?我还认为,除了使用
ismember()
,还有其他一些技巧?
M
x
是否已排序?是的,它们都已排序。感谢您的建议,在我的原始问题中,我们有
M
x
,您能否基于此修改答案?另外,由于我提到的100万次迭代或1次迭代需要,请将您的for合并?假设for循环有100万次迭代。我将更改变量。感谢您的澄清,您还可以提到
out
代表什么吗?更新答案,更正错误并添加一些解释。