Matlab 快速搜索排序向量中大于x的最小值

Matlab 快速搜索排序向量中大于x的最小值,matlab,optimization,Matlab,Optimization,Fast的意思是比O(N)更好,O(N)与find()的功能一样好。我知道有ismembc和ismembc2,但我认为这两个都不是我想要的。我阅读了文档,他们似乎在搜索一个等于x的成员,但我希望第一个值的索引大于x 现在,如果这两个函数中的任何一个都能做到这一点,有人能举个例子吗,因为我想不出来 理想行为: first_greater_than([0, 3, 3, 4, 7], 1) 返回2,第一个值的索引大于1,尽管输入数组显然要大得多 当然,二进制搜索并不太难实现,但如果MATLAB已经实

Fast的意思是比O(N)更好,O(N)与find()的功能一样好。我知道有
ismembc
ismembc2
,但我认为这两个都不是我想要的。我阅读了文档,他们似乎在搜索一个等于x的成员,但我希望第一个值的索引大于x

现在,如果这两个函数中的任何一个都能做到这一点,有人能举个例子吗,因为我想不出来

理想行为:

first_greater_than([0, 3, 3, 4, 7], 1)
返回2,第一个值的索引大于1,尽管输入数组显然要大得多


当然,二进制搜索并不太难实现,但如果MATLAB已经实现了,我宁愿使用他们的方法。

由于输入已经排序,自定义二进制搜索应该可以工作(您可能需要对边缘情况进行一些更新,即请求的值小于数组的所有元素):

function[result,res2]=二进制搜索示例(val)
%//生成示例数据并对其进行排序
N=100000000;
a=兰特(N,1);
a=排序(a);
%//运行算法
二进制搜索算法的tic%开始计时
div=1;
idx=楼层(N/div);
而(1)
div=div*2;
%//检查是否小于val检查下一个是否大于val
如果是(idx)val,
结果=a(idx+1);
打破
else%//变大
idx=idx+最大值([楼层(N/div),1]);
结束
结束
如果a(idx)>val,%
idx=idx-楼层(N/div);
结束
结束%结束while循环
二进制搜索算法的toc%结束计时
%% ------------------------
%%与MATLAB的比较
matlab查找的tic%开始计时
j=查找(a>val,1);
res2=a(j);
matlab查找的toc%结束计时
%//基准
>>[res1,res2]=二进制搜索示例(0.556)
运行时间为0.000093秒。
运行时间为0.327183秒。
res1=
0.5560
res2=
0.5560

这是我的实现。这并不是我想要的答案,但现在,我必须假设我所追求的不是在MATLAB中实现的

关于指数的注记 所有MATLAB索引都是错误的,因为它们从1开始,而不是0。我 尽管如此,索引仍然从0开始。因此,贯穿始终,您将看到 如下所示的索引:
数组(1+i)
访问元素i,其中i是 在[0,N]中。此外,所有MATLAB范围都是错误的。它们的约定是 [a,b],而不是[a,b]。因此您将看到如下所示的范围 贯穿:0:N-1是数字的范围(通常是 N维数组)从0到N。当数组使用范围索引时, 必须同时进行两项更正。1添加到顶部 从顶部减去底部边界和1。结果如下: 数组(1+a:b)访问[a,b]中的元素,其中a和b在[0,N]中 b>a。我真的应该用python和scipy来代替,但现在已经太迟了。下一个项目

binary_search.m:在我看来,它比@ljk07的实现要整洁得多,但它们当然还是得到了接受。谢谢,@ljk07

function i = binary_search(v, x)
%binary_search finds the first element in v greater than x
% v is a vector and x is a double. Returns the index of the desired element
% as an int64 or -1 if it doesn't exist.

% We'll call the first element of v greater than x v_f.

% Is v_f the zeroth element? This is technically covered by the algorithm,
% but is such a common case that it should be addressed immediately. It
% would otherwise take the same amount of time as the rest of them. This
% will add a check to each of the others, though, so it's a toss-up to an
% extent.
if v(1+0) > x
    i = 0;
    return;
end

% MATLAB foolishly returns the number of elements as a floating point
% constant. Thank you very much, MATLAB.
b = int64(numel(v));

% If v_f doesn't exist, return -1. This is also needed to ensure the
% algorithm later on terminates, which makes sense.
if v(1+b-1) <= x
    i = -1;
    return;
end

a = int64(0);

% There is now guaranteed to be more than one element, since if there
% wasn't, one of the above would have matched. So we split the [a, b) range
% at the top of the loop.

% The number of elements in the interval. Calculated once per loop. It is
% recalculated at the bottom of the loop, so it needs to be calculated just
% once before the loop can begin.
n = b;
while true
    % MATLAB's / operator foolishly rounds to nearest instead of flooring
    % when both inputs are integers. Thank you very much, MATLAB.
    p = a + idivide(n, int64(2));

    % Is v_f in [a, p) or [p, b)?
    if v(1+p-1) > x
        % v_f is in [a, p).
        b = p;
    else
        % v_f is in [p, b).
        a = p;
    end

    n = b - a;
    if n == 1
        i = a;
        return;
    end
end
end
二进制搜索测试.m的输出(在我的计算机上):


当然有一个加速。在我的电脑上你可以看到加速达到大约一百万个元素。因此,除非用C实现二元搜索或者你有一个大约一百万个元素的向量,否则find速度更快,尽管它使用了一个愚蠢的算法。我原以为阈值会比这个低。我的ss是因为find主要在C中内部实现。不公平:(但无论如何,对于我的特定应用程序,我的向量大小只有1000左右,所以毕竟,find对我来说真的更快。至少直到有一天我用mex文件在C中实现二进制搜索或切换到scipy,以先发生的为准。我有点厌倦了MATLAB的一些不方便的切换。你可以通过rea在我的代码中添加注释。

为什么你把比较放在
查找
的循环中而
循环中?为什么
tic toc
循环中?是的,我确实认为这是一个选项,看起来这是我最后要做的,因为我的时间不多了,但这不是我想要的。我希望会有做一个MATLAB函数吧。现在,我还得留下你的(尽管很好)回答不被接受,因为这不是我真正想要的。我将编辑这个问题作为回答。第一次测量的
tic
在哪里?@eit这里的
tic toc
s有些可疑……我会对这些测量持保留态度。tic toc放置错误,但二进制搜索确实更快而不是使用蛮力O(n)查找(即使是使用while循环实现的)。我自己刚刚测试过。您使用了
find
?您知道
'first'
属性?
find([0,3,3,4,7]>1,1,'first'))
正是你想做的。@thewaywewalk我认为你的答案只是返回索引,需要用索引进一步获取值。方法本身足够优雅。我是这样做的,但你可以用这个变量来处理初始向量:
A=[0,3,3,4,7]
A(find(A>1,1,'first'))=3
@thewaywewalk我刚才在OP中对find()说了些什么?@Ray哎哟。我的意思是它返回索引。索引更有用。现在编辑OP。
function i = binary_search(v, x)
%binary_search finds the first element in v greater than x
% v is a vector and x is a double. Returns the index of the desired element
% as an int64 or -1 if it doesn't exist.

% We'll call the first element of v greater than x v_f.

% Is v_f the zeroth element? This is technically covered by the algorithm,
% but is such a common case that it should be addressed immediately. It
% would otherwise take the same amount of time as the rest of them. This
% will add a check to each of the others, though, so it's a toss-up to an
% extent.
if v(1+0) > x
    i = 0;
    return;
end

% MATLAB foolishly returns the number of elements as a floating point
% constant. Thank you very much, MATLAB.
b = int64(numel(v));

% If v_f doesn't exist, return -1. This is also needed to ensure the
% algorithm later on terminates, which makes sense.
if v(1+b-1) <= x
    i = -1;
    return;
end

a = int64(0);

% There is now guaranteed to be more than one element, since if there
% wasn't, one of the above would have matched. So we split the [a, b) range
% at the top of the loop.

% The number of elements in the interval. Calculated once per loop. It is
% recalculated at the bottom of the loop, so it needs to be calculated just
% once before the loop can begin.
n = b;
while true
    % MATLAB's / operator foolishly rounds to nearest instead of flooring
    % when both inputs are integers. Thank you very much, MATLAB.
    p = a + idivide(n, int64(2));

    % Is v_f in [a, p) or [p, b)?
    if v(1+p-1) > x
        % v_f is in [a, p).
        b = p;
    else
        % v_f is in [p, b).
        a = p;
    end

    n = b - a;
    if n == 1
        i = a;
        return;
    end
end
end
% Some simple tests. These had better pass...
assert(binary_search([0], 0) == -1);
assert(binary_search([0], -1) == 0);

assert(binary_search([0 1], 0.5) == 1);
assert(binary_search([0 1 1], 0.5) == 1);
assert(binary_search([0 1 2], 0.5) == 1);
assert(binary_search([0 1 2], 1.5) == 2);

% Compare the algorithm to internal find.
for n = [1 1:8]
    n
    v = sort(rand(10^n, 1));
    x = 0.5;
    %%
    tic;
    ifind = find(v > x, 1,'first') - 1;
    toc;
    % repeat. The second time is faster usually. Some kind of JIT
    % optimisation...
    tic;
    ifind = find(v > x, 1,'first') - 1;
    toc;
    tic;
    ibs = binary_search(v, x);
    toc;
    tic;
    ibs = binary_search(v, x);
    toc;
    assert(ifind == ibs);
end
n =

     1

Elapsed time is 0.000054 seconds.
Elapsed time is 0.000021 seconds.
Elapsed time is 0.001273 seconds.
Elapsed time is 0.001135 seconds.

n =

     2

Elapsed time is 0.000050 seconds.
Elapsed time is 0.000018 seconds.
Elapsed time is 0.001571 seconds.
Elapsed time is 0.001494 seconds.

n =

     3

Elapsed time is 0.000034 seconds.
Elapsed time is 0.000025 seconds.
Elapsed time is 0.002344 seconds.
Elapsed time is 0.002193 seconds.

n =

     4

Elapsed time is 0.000057 seconds.
Elapsed time is 0.000044 seconds.
Elapsed time is 0.003131 seconds.
Elapsed time is 0.003031 seconds.

n =

     5

Elapsed time is 0.000473 seconds.
Elapsed time is 0.000333 seconds.
Elapsed time is 0.003620 seconds.
Elapsed time is 0.003161 seconds.

n =

     6

Elapsed time is 0.003984 seconds.
Elapsed time is 0.003635 seconds.
Elapsed time is 0.004209 seconds.
Elapsed time is 0.003825 seconds.

n =

     7

Elapsed time is 0.034811 seconds.
Elapsed time is 0.039106 seconds.
Elapsed time is 0.005089 seconds.
Elapsed time is 0.004867 seconds.

n =

     8

Elapsed time is 0.322853 seconds.
Elapsed time is 0.323777 seconds.
Elapsed time is 0.005969 seconds.
Elapsed time is 0.005487 seconds.