基于阈值创建插值渐变数组-Python/NumPy_Python_Arrays_Numpy

基于阈值创建插值渐变数组-Python/NumPy

python arrays numpy

基于阈值创建插值渐变数组-Python/NumPy,python,arrays,numpy,Python,Arrays,Numpy,我想测量填充某些条件（如停止时钟）的子阵列的长度，但一旦不再满足该条件，该值应重置为零。因此，结果数组应该告诉我，有多少值满足某些条件（例如，值>1）：应将结果放入以下数组中： [0, 0, 1, 2, 3, 4, 0, 1, 2, 0] 可以很容易地在python中定义一个函数，该函数返回相应的numy数组： def StopClock(signal, threshold=1): clock = [] current_time = 0 for item in si

我想测量填充某些条件（如停止时钟）的子阵列的长度，但一旦不再满足该条件，该值应重置为零。因此，结果数组应该告诉我，有多少值满足某些条件（例如，值>1）：

应将结果放入以下数组中：

[0, 0, 1, 2, 3, 4, 0, 1, 2, 0]

可以很容易地在python中定义一个函数，该函数返回相应的numy数组：

def StopClock(signal, threshold=1):

    clock = []
    current_time = 0
    for item in signal:
        if item > threshold:
            current_time += 1
        else:
            current_time = 0
        clock.append(current_time)
    return np.array(clock)

StopClock([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])

然而，我真的不喜欢这个for循环，特别是因为这个计数器应该在更长的数据集上运行。我想到了一些结合

np.diff

的

np.cumsum

解决方案，但是我没有完成重置部分。有人知道上述问题的更优雅的numpy风格的解决方案吗？

此解决方案使用pandas执行

groupby

：

s = pd.Series([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])
threshold = 0
>>> np.where(
        s > threshold, 
        s
        .to_frame()  # Convert series to dataframe.
        .assign(_dummy_=1)  # Add column of ones.
        .groupby((s.gt(threshold) != s.gt(threshold).shift()).cumsum())['_dummy_']  # shift-cumsum pattern
        .transform(lambda x: x.cumsum()), # Cumsum the ones per group.
        0)  # Fill value with zero where threshold not exceeded.
array([0, 0, 1, 2, 3, 4, 0, 1, 2, 0])

另一个numpy解决方案：

import numpy as np
a = np.array([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])

def stop_clock(signal, threshold=1):
    mask = signal > threshold
    indices = np.flatnonzero(np.diff(mask)) + 1
    return np.concatenate(list(map(np.cumsum, np.array_split(mask, indices))))

stop_clock(a)
# array([0, 0, 1, 2, 3, 4, 0, 1, 2, 0])

是的，我们可以使用

diff-styled differentication

和

cumsum

以矢量化的方式创建这样的插值渐变，这应该非常有效，特别是对于大型输入阵列。重置部分是通过在每个间隔结束时分配适当的值来完成的，其思想是在每个间隔结束时重置数字的总和

这里有一个实现来完成所有这些-

def intervaled_ramp(a, thresh=1):
    mask = a>thresh

    # Get start, stop indices
    mask_ext = np.concatenate(([False], mask, [False] ))
    idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1])
    s0,s1 = idx[::2], idx[1::2]

    out = mask.astype(int)
    valid_stop = s1[s1<len(a)]
    out[valid_stop] = s0[:len(valid_stop)] - valid_stop
    return out.cumsum()

运行时测试

进行公平基准测试的一种方法是使用问题中发布的示例，并将其分为许多次，并将其用作输入数组。有了这样的设置，时间安排如下-

In [841]: a = np.array([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])

In [842]: a = np.tile(a,10000)

# @Alexander's soln
In [843]: %timeit pandas_app(a, threshold=1)
1 loop, best of 3: 3.93 s per loop

# @Psidom 's soln
In [844]: %timeit stop_clock(a, threshold=1)
10 loops, best of 3: 119 ms per loop

# Proposed in this post
In [845]: %timeit intervaled_ramp(a, thresh=1)
1000 loops, best of 3: 527 µs per loop

虽然Alexander的解决方案非常优雅，而Psidom的解决方案是最具可读性的解决方案，但由于速度的原因，该解决方案是完美的。谢谢大家！

Input (a) : 
[5 3 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 1]
Output (intervaled_ramp(a, thresh=1)) : 
[1 2 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 0]

Input (a) : 
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 1]
Output (intervaled_ramp(a, thresh=1)) : 
[0 0 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 0]

Input (a) : 
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 5]
Output (intervaled_ramp(a, thresh=1)) : 
[0 0 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 1]

Input (a) : 
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 5]
Output (intervaled_ramp(a, thresh=0)) : 
[1 2 3 4 5 0 0 1 2 3 4 0 1 2 0 1 2 3 0 1 2 3 4 0 1]

In [841]: a = np.array([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])

In [842]: a = np.tile(a,10000)

# @Alexander's soln
In [843]: %timeit pandas_app(a, threshold=1)
1 loop, best of 3: 3.93 s per loop

# @Psidom 's soln
In [844]: %timeit stop_clock(a, threshold=1)
10 loops, best of 3: 119 ms per loop

# Proposed in this post
In [845]: %timeit intervaled_ramp(a, thresh=1)
1000 loops, best of 3: 527 µs per loop