在python中，如何基于阈值从数组中提取子数组？_Python_Numpy

在python中，如何基于阈值从数组中提取子数组？

python numpy

在python中，如何基于阈值从数组中提取子数组？,python,numpy,Python,Numpy,我有一个numpy数组的形式： a = numpy.array([0,2,2,3,4,2,5,5,6,2,5,6,4,4,2,3,1,7,7,2,3,3,4,1,8,9,8,8]) threshold = 4 threshold_seq_len = 5 subarray_seq_len = 4 我希望实现的产出是 b =[array([5,5,6,2,5,6]), array([8,9,8,8])] 我想根据以下条件提取子阵列： 1）子阵列应基于低于或等于阈值的值序列进行分割。在上述情况下

我有一个numpy数组的形式：

a = numpy.array([0,2,2,3,4,2,5,5,6,2,5,6,4,4,2,3,1,7,7,2,3,3,4,1,8,9,8,8])
threshold = 4
threshold_seq_len = 5
subarray_seq_len = 4

我希望实现的产出是

b =[array([5,5,6,2,5,6]), array([8,9,8,8])]

我想根据以下条件提取子阵列：

1）子阵列应基于低于或等于阈值的值序列进行分割。在上述情况下，第一子阵列

（[5,5,6,2,5,6]）

发生在序列

[0,2,2,3,4,2]

之后，所有这些序列都低于或等于阈值4

2）阈值序列应该至少与threshold_seq_len一样长，否则它们将只是子阵列的一部分。请注意，

值“2”

存在于第一个子数组中，因为它是单数出现的

（长度=1）

3）子阵列本身应至少与子阵列长度相同。例如，指数17和18处的值各为7，但不考虑它们，因为

长度这里有一种方法-
from scipy.ndimage.morphology import binary_closing

def filter_ar(a, threshold, threshold_seq_len, subarray_seq_len):
    # Mask wrt threshold
    m0 = np.r_[False,a>threshold,False]

    # Close "holes", those one-off lesser than thresh elements
    k = np.ones(2,dtype=bool)
    m = binary_closing(m0,k)

    # Get initial start, stop indices
    idx = np.flatnonzero(m[:-1] != m[1:])
    s0,s1 = idx[::2],idx[1::2]

    # Masks based on subarray_seq_len, threshold_seq_len
    mask1 = (s1-s0)>=subarray_seq_len
    mask2 = np.add.reduceat(m0,s0) >= threshold_seq_len

    # Get combined one after looking for first sequence that has threshold_seq_len
    # elements > threshold
    mask1[mask2.argmax():] &= True

    # Get valid start,stop indices and then split input array 
    starts,ends = s0[mask1],s1[mask1]
    out = [a[i:j] for (i,j) in zip(starts,ends)]
    return out

这在您的示例中确实有效，但我无法避免列表理解。此外，我还没有检查这是否比简单地在列表上迭代慢。。。（可能是）
谢谢@Divakar有没有办法不将此作为后处理步骤？计时是这里的关键。@Divakar这将过滤掉条件2中的2
，这是有效的子数组吗？[5,2,5,2,6,2,5,2,6]@Ardweaden一个边缘情况，它确实是不受欢迎的，但根据我给定的逻辑，它是一个公平的候选者。对于threshold\u seq\u len=5
，它不应该只有一个子数组吗？threshold\u seq\u len是阈值的长度。子阵列是为候选人自己准备的。我编辑了这个例子，以便更清楚地说明两者的区别。对不起，我太愚蠢了。我还是不明白。I这与阈值以上两个连续值之间阈值以下的值的数量有关吗？在这一点上，我只是要求大家理解，如果Divakar的代码有效，那么它与ofc无关。没有愚蠢的问题。查看示例中的第一个子阵列（[5,5,6,2,5,6]）。它发生在6个小于或等于阈值的连续值之后（每个值小于/等于值4）。因此，长度（=6）大于阈值长度（=5）。另一方面，索引16和17处的值都是>*threshold*，并且也出现在threshold\u seq\u len连续值之后，但是不被考虑，因为子阵列（=2）本身的长度b = np.where(a > threshold)[0]
d = np.where(np.diff(b) >= threshold_seq_len)[0]
e = np.split(b,d+1)

subarrays = [a[i[0]:i[-1]+1] for i in e if (i[-1]-i[0] + 1) >= subarray_seq_len]