Matlab 寻找聚集的南，但留下孤独的南_Matlab_Nan

Matlab 寻找聚集的南，但留下孤独的南

matlab

Matlab 寻找聚集的南，但留下孤独的南,matlab,nan,Matlab,Nan,我有一个不完整的数据集 N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]' 我希望确定一组南部，即如果随后的南部数量超过2。我该怎么做？您可以这样做： aux = diff([0; isnan(N); 0]); clusters = [find(aux == 1) find(aux == -1) - 1]; clusters(:,2) - clusters(:,1) + 1 ans = 1

我有一个不完整的数据集

N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]'

我希望确定一组南部，即如果随后的南部数量超过2。我该怎么做？

您可以这样做：

aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];

clusters(:,2) - clusters(:,1) + 1
ans = 
     1
     1
     2
     5

clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
ans =

     8     9
    15    19

然后集群将是一个Nx2矩阵，其中N是NaN集群（所有集群）的数量，每行为您提供集群的开始和结束索引

在本例中，这将是：

这意味着有4个NaN簇，簇1的范围从索引1到索引1，簇2的范围从5到5，簇3的范围从8到9，簇4的范围从15到19

如果您只想要至少具有

nan的集群，可以这样做（例如，使用K=2）：

这将给你以下信息：

也就是说，集群8-9和15-19具有2个或更多的NAN

说明：

Z = find( filter(ones(num,1),1,isnan(N)) == num ) - num;
y = unique( bsxfun(@plus, Z,1:num) );

%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]

out = [strfind(num2str(isnan([ 0 N 0]),'%1d'),'011');strfind(num2str(isnan([ 0 N 0]),'%1d'),'110')]'

查找群集

isnan（N）

为您提供一个逻辑向量，其中包含作为一个的nan：

N --------> NaN 1  2  3 NaN 5  6 NaN NaN 7  8 10 12 20 NaN NaN NaN NaN NaN
isnan(N) ->  1  0  0  0  1  0  0  1   1  0  0  0  0  0  1   1   1   1   1

我们想知道每个序列的起始位置，所以我们使用

diff

，它计算每个值减去前一个值，并给出以下结果：

aux = diff(isnan(N));
N ----> NaN 1  2  3 NaN 5  6 NaN NaN 7  8 10 12 20 NaN NaN NaN NaN NaN
aux --> -1  0  0  1 -1  0  1  0  -1  0  0  0  0  1   0   0   0   0

aux = diff([0; isnan(N); 0]);
N ----> NaN 1  2  3 NaN 5  6 NaN NaN 7  8 10 12 20 NaN NaN NaN NaN NaN
aux -->  1 -1  0  0  1 -1  0  1  0  -1  0  0  0  0  1   0   0   0   0  -1

其中，

表示组开始，

-1

表示组结束。但是它错过了第一个组开始和最后一个组结束，因为第一个

元素不存在（因为它是第一个

，所以它没有上一个），最后一个

-1

也不存在（因为

上最后一个

之后没有任何内容）。常见的修复方法是在数组前后添加一个零，这给了我们以下信息：

aux = diff(isnan(N));
N ----> NaN 1  2  3 NaN 5  6 NaN NaN 7  8 10 12 20 NaN NaN NaN NaN NaN
aux --> -1  0  0  1 -1  0  1  0  -1  0  0  0  0  1   0   0   0   0

aux = diff([0; isnan(N); 0]);
N ----> NaN 1  2  3 NaN 5  6 NaN NaN 7  8 10 12 20 NaN NaN NaN NaN NaN
aux -->  1 -1  0  0  1 -1  0  1  0  -1  0  0  0  0  1   0   0   0   0  -1

注意两件事：

如果索引

处的差异为

，

N（i）

为NaN块的起点

如果索引

处的差异为

-1

，N（i-1）
为NaN块的末端

为了获得开始和结束，我们使用

find

来获取索引，其中aux==1和aux==1。因此，我们调用

find

两次，并使用

和

连接这两个调用：

aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];

筛选包含K个或更多元素的群集

最后一步是找到包含K个或更多元素的簇。为此，我们首先获取聚类矩阵，从第一列中减去第一列，然后添加

，如下所示：

aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];

clusters(:,2) - clusters(:,1) + 1
ans = 
     1
     1
     2
     5

clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
ans =

     8     9
    15    19

这意味着集群1和集群2有1个NaN，集群3有3个NaN，集群4有5个NaN。如果我们询问哪些值大于或等于K，我们得到：

clusters(:,2) - clusters(:,1) + 1 >= K
ans =
     0
     0
     1
     1

这是一个逻辑阵列。我们可以使用它来索引集群矩阵的

（true）行，如下所示：

aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];

clusters(:,2) - clusters(:,1) + 1
ans = 
     1
     1
     2
     5

clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
ans =

     8     9
    15    19

这就像问：只给我们行与此逻辑向量上的行匹配的集群，并给我们所有列（由

：

表示）。

这里是一个模块化解决方案：

% the number of NaN you consider as a cluster
num = 3;

% moving average filter
Z = filter(ones(num,1),1,isnan(N));

x = arrayfun(@(x) find(Z == num) - num + x, 1:num,'uni',0)
y = unique(cell2mat(x))

（更新：以下更快版本）

给出了

num=1

：

y =     1     5     8     9    15    16    17    18    19

对于

num=2

：

y =     8     9    15    16    17    18    19

对于

num=3

，

num=4

和

num=5

：

y =    15    16    17    18    19

最后对于

num=6

。。。还有更多

y =   Empty matrix: 1-by-0

解释

isnan（N）

返回一个逻辑数组，其中一个位于

NaN

的位置

Z=过滤器（一个（num，1），一个（N））

是移动平均滤波器的一种实现，其滤波器窗口为

one（num，1）=[1]

（对于

num=3

）。因此，当一行中有3个

NaN

时，大小为3的过滤器滑动数组并刚好达到值

num=3

。看起来是这样的：

%//  N   isnan(N)     Z

   NaN          1     1
     1          0     1
     2          0     1
     3          0     0
   NaN          1     1
     5          0     1
     6          0     1
   NaN          1     1
   NaN          1     2
     7          0     2
     8          0     1
    10          0     0
    12          0     0
    20          0     0
   NaN          1     1
   NaN          1     2
   NaN          1     3
   NaN          1     3
   NaN          1     3

现在很容易找到3的所有元素：

find（Z==num）

-但是您还需要在前面找到所有2：在前面找到所有

find（Z==num）-num+2

和所有1。使用的不是循环数组，而是基本相同的数组。结果你得到了一个包含很多索引的矩阵，很多索引都是多重的，但是你只需要

唯一的索引。我希望现在一切都清楚了
实际上，从arrayfun中获取find
要快得多，甚至可以用bsxfun
替换，您可以去掉cell2mat
，这导致以下形式：
更快：
Z = find( filter(ones(num,1),1,isnan(N)) == num ) - num;
y = unique( bsxfun(@plus, Z,1:num) );

%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]

out = [strfind(num2str(isnan([ 0 N 0]),'%1d'),'011');strfind(num2str(isnan([ 0 N 0]),'%1d'),'110')]'

或更快地完成一行：
y = unique(bsxfun(@plus,find(filter(ones(num,1),1,isnan(N))==num)-num,1:num));

这使用了diff
，但似乎更简单：
ind = diff([0; isnan(N(:))]);
result = find(ind(1:end-1)==1 & ind(2:end)==0);

在您的示例中，这给出了[8 15]

工作原理：ind
获取以下值：

1
其中开始运行（一个或多个）NaN
值
0
，其中NaN
和数值之间与先前的值没有变化
-1
其中开始运行（一个或多个）数值

第二行选择开始运行NaN
的位置，以便下一个位置也是NaN
。因此，它根据需要为每次运行提供多个NaN
。
STRFIND方法
I.喜欢一行：
Z = find( filter(ones(num,1),1,isnan(N)) == num ) - num;
y = unique( bsxfun(@plus, Z,1:num) );

%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]

out = [strfind(num2str(isnan([ 0 N 0]),'%1d'),'011');strfind(num2str(isnan([ 0 N 0]),'%1d'),'110')]'

输出
N =
       NaN  1  2  3  NaN  5  6  NaN NaN  7  8  10  12  20 NaN NaN NaN NaN NaN

N2 =
     0  1   0  0  0   1   0  0   1   1   0  0   0   0   0  1   1   1   1   1   0

start_ind =
     8    15

stop_ind =
     9    19

out =
     8     9
    15    19

出去=
II。详细说明：
Z = find( filter(ones(num,1),1,isnan(N)) == num ) - num;
y = unique( bsxfun(@plus, Z,1:num) );

%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]

out = [strfind(num2str(isnan([ 0 N 0]),'%1d'),'011');strfind(num2str(isnan([ 0 N 0]),'%1d'),'110')]'

基本上，您正在尝试执行滑动窗口检查，在使用双数组时没有直接的方法，但是在转换为字符串后，可以使用。这里使用了这个技巧
我建议遵循代码中使用的注释和输出编号来理解它。请注意，对于这种特殊情况，集群是指由两个或多个连续的NAN组成的一组
代码
%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]

%%// Set the locations where NaNs are present and then 
%%// append at the start and end with zeros
N2 = isnan([ 0 N 0])

%%// Find the start indices of  all NaN clusters
start_ind = strfind(num2str(N2,'%1d'),'011')

%%// Find the stop indices of  all NaN clusters
stop_ind = [strfind(num2str(N2,'%1d'),'110')]

%%// Put start and stop indices into a Mx2 matrix
out = [start_ind' stop_ind']

输出
N =
       NaN  1  2  3  NaN  5  6  NaN NaN  7  8  10  12  20 NaN NaN NaN NaN NaN

N2 =
     0  1   0  0  0   1   0  0   1   1   0  0   0   0   0  1   1   1   1   1   0

start_ind =
     8    15

stop_ind =
     9    19

out =
     8     9
    15    19

超过2或达到2？是否应该检测到第8行和第9行？我希望这是模块化的。所以现在超过2个。您希望输出结果是什么？像你