基于阈值创建插值渐变数组-Python/NumPy
我想测量填充某些条件(如停止时钟)的子阵列的长度,但一旦不再满足该条件,该值应重置为零。因此,结果数组应该告诉我,有多少值满足某些条件(例如,值>1): 应将结果放入以下数组中:基于阈值创建插值渐变数组-Python/NumPy,python,arrays,numpy,Python,Arrays,Numpy,我想测量填充某些条件(如停止时钟)的子阵列的长度,但一旦不再满足该条件,该值应重置为零。因此,结果数组应该告诉我,有多少值满足某些条件(例如,值>1): 应将结果放入以下数组中: [0, 0, 1, 2, 3, 4, 0, 1, 2, 0] 可以很容易地在python中定义一个函数,该函数返回相应的numy数组: def StopClock(signal, threshold=1): clock = [] current_time = 0 for item in si
[0, 0, 1, 2, 3, 4, 0, 1, 2, 0]
可以很容易地在python中定义一个函数,该函数返回相应的numy数组:
def StopClock(signal, threshold=1):
clock = []
current_time = 0
for item in signal:
if item > threshold:
current_time += 1
else:
current_time = 0
clock.append(current_time)
return np.array(clock)
StopClock([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])
然而,我真的不喜欢这个for循环,特别是因为这个计数器应该在更长的数据集上运行。我想到了一些结合
np.diff
的np.cumsum
解决方案,但是我没有完成重置部分。有人知道上述问题的更优雅的numpy风格的解决方案吗?此解决方案使用pandas执行groupby
:
s = pd.Series([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])
threshold = 0
>>> np.where(
s > threshold,
s
.to_frame() # Convert series to dataframe.
.assign(_dummy_=1) # Add column of ones.
.groupby((s.gt(threshold) != s.gt(threshold).shift()).cumsum())['_dummy_'] # shift-cumsum pattern
.transform(lambda x: x.cumsum()), # Cumsum the ones per group.
0) # Fill value with zero where threshold not exceeded.
array([0, 0, 1, 2, 3, 4, 0, 1, 2, 0])
另一个numpy解决方案:
import numpy as np
a = np.array([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])
def stop_clock(signal, threshold=1):
mask = signal > threshold
indices = np.flatnonzero(np.diff(mask)) + 1
return np.concatenate(list(map(np.cumsum, np.array_split(mask, indices))))
stop_clock(a)
# array([0, 0, 1, 2, 3, 4, 0, 1, 2, 0])
是的,我们可以使用
diff-styled differentication
和cumsum
以矢量化的方式创建这样的插值渐变,这应该非常有效,特别是对于大型输入阵列。重置部分是通过在每个间隔结束时分配适当的值来完成的,其思想是在每个间隔结束时重置数字的总和
这里有一个实现来完成所有这些-
def intervaled_ramp(a, thresh=1):
mask = a>thresh
# Get start, stop indices
mask_ext = np.concatenate(([False], mask, [False] ))
idx = np.flatnonzero(mask_ext[1:] != mask_ext[:-1])
s0,s1 = idx[::2], idx[1::2]
out = mask.astype(int)
valid_stop = s1[s1<len(a)]
out[valid_stop] = s0[:len(valid_stop)] - valid_stop
return out.cumsum()
运行时测试
进行公平基准测试的一种方法是使用问题中发布的示例,并将其分为许多次,并将其用作输入数组。有了这样的设置,时间安排如下-
In [841]: a = np.array([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])
In [842]: a = np.tile(a,10000)
# @Alexander's soln
In [843]: %timeit pandas_app(a, threshold=1)
1 loop, best of 3: 3.93 s per loop
# @Psidom 's soln
In [844]: %timeit stop_clock(a, threshold=1)
10 loops, best of 3: 119 ms per loop
# Proposed in this post
In [845]: %timeit intervaled_ramp(a, thresh=1)
1000 loops, best of 3: 527 µs per loop
虽然Alexander的解决方案非常优雅,而Psidom的解决方案是最具可读性的解决方案,但由于速度的原因,该解决方案是完美的。谢谢大家!
Input (a) :
[5 3 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 1]
Output (intervaled_ramp(a, thresh=1)) :
[1 2 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 0]
Input (a) :
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 1]
Output (intervaled_ramp(a, thresh=1)) :
[0 0 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 0]
Input (a) :
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 5]
Output (intervaled_ramp(a, thresh=1)) :
[0 0 0 1 2 0 0 1 2 3 4 0 1 2 0 0 0 1 0 1 2 3 4 0 1]
Input (a) :
[1 1 1 4 5 0 0 2 2 2 2 0 3 3 0 1 1 2 0 3 5 4 3 0 5]
Output (intervaled_ramp(a, thresh=0)) :
[1 2 3 4 5 0 0 1 2 3 4 0 1 2 0 1 2 3 0 1 2 3 4 0 1]
In [841]: a = np.array([0, 0, 2, 2, 2, 2, 0, 3, 3, 0])
In [842]: a = np.tile(a,10000)
# @Alexander's soln
In [843]: %timeit pandas_app(a, threshold=1)
1 loop, best of 3: 3.93 s per loop
# @Psidom 's soln
In [844]: %timeit stop_clock(a, threshold=1)
10 loops, best of 3: 119 ms per loop
# Proposed in this post
In [845]: %timeit intervaled_ramp(a, thresh=1)
1000 loops, best of 3: 527 µs per loop