Numpy 将峰值数组转换为表示最近峰值的一系列步骤

Numpy 将峰值数组转换为表示最近峰值的一系列步骤,numpy,Numpy,给定一系列类似这样的峰值: peaks = [0, 5, 0, 3, 2, 0, 1, 7, 0] steps = [0, 5, 5, 3, 3, 3, 3, 7, 7] 如何创建指示最近峰值的步骤数组,如下所示: peaks = [0, 5, 0, 3, 2, 0, 1, 7, 0] steps = [0, 5, 5, 3, 3, 3, 3, 7, 7] 要求: 这将用于大型3D图像(1000**3)的图像分析,因此需要快速,这意味着不需要循环或列表理解…只需要numpy矢量化 我上

给定一系列类似这样的峰值:

peaks = [0, 5, 0, 3, 2, 0, 1, 7, 0]
steps = [0, 5, 5, 3, 3, 3, 3, 7, 7]
如何创建指示最近峰值的步骤数组,如下所示:

peaks = [0, 5, 0, 3, 2, 0, 1, 7, 0]
steps = [0, 5, 5, 3, 3, 3, 3, 7, 7]
要求:

  • 这将用于大型3D图像(1000**3)的图像分析,因此需要快速,这意味着不需要循环或列表理解…只需要numpy矢量化
  • 我上面给出的示例是一个线性列表,但这需要同样适用于ND图像。这意味着沿单个轴进行操作,但同时允许多个轴

注意


我最近发现这是一个骗局(用
scipy.maximum.accumulate
很容易解决),但我的问题也包含了一个可选的“如果”扭曲,如上所述。事实证明,我实际上也需要第二种行为,所以我只发布了这一部分。

这是一个处理ND的解决方案,可以检测“宽峰”,如
,0,4,4,4,3,
,但不是
,0,4,4,7,

import numpy as np
import operator as op

def keep_peaks(A, axis=-1):
    B = np.swapaxes(A, axis, -1)
    # take differences between consecutive elements along axis
    # pad with -1 at the start and the end
    # the most efficient way is to allocate first, because otherwise
    # padding would involve reallocation and a copy
    # note that in order to avoid that copy we use np.subtract and its
    # out kwd
    updown = np.empty((*B.shape[:-1], B.shape[-1]+1), B.dtype)
    updown[..., 0], updown[..., -1] = -1, -1
    np.subtract(B[..., 1:], B[..., :-1], out=updown[..., 1:-1])
    # extract indices where the there is a change along axis
    chnidx = np.where(updown)
    # get the values of the changes
    chng = updown[chnidx]
    # find indices of indices 1) where we go up and 2) the next change is
    # down (note how the padded -1's at the end are useful here)
    # also include the beginning of each 1D subarray
    pkidx, = np.where((chng[:-1] > 0) & (chng[1:] < 0) | (chnidx[-1][:-1] == 0))
    # use indices of indices to retain only peak indices
    pkidx = (*map(op.itemgetter(pkidx), chnidx),)
    # construct array of changes of the result along axis
    # these will be zero everywhere
    out = np.zeros_like(A)
    aux = out.swapaxes(axis, -1)
    # except where there is a new peak
    # at these positions we need to put the differences of peak levels
    aux[(*map(op.itemgetter(slice(1, None)), pkidx),)] = np.diff(B[pkidx])
    # we could ravel the array and do the cumsum on that, but raveling
    # a potentially noncontiguous array is expensive
    # instead we keep the shape, at the cost of having to replace the
    # value at the beginning of each 2D subarray (we do not need the
    # "line-jump" difference but the plain 1st value there)
    aux[..., 0] = B[..., 0]
    # finally, use cumsum to go from differences to plain values
    return out.cumsum(axis=axis)

peaks = [0, 5, 0, 3, 2, 0, 1, 7, 0]

print(peaks)
print(keep_peaks(peaks))

# show off axis kwd and broad peak detection
peaks3d = np.kron(np.random.randint(0, 10, (3, 6, 3)), np.ones((1, 2, 1), int))

print(peaks3d.swapaxes(1, 2))
print(keep_peaks(peaks3d, 1).swapaxes(1, 2))

如果一定要快,C是比python更好的选择。Fortran可以说比这两种语言都好。@Madran物理学家说得对,而且+1为Fortran提供了支持……但也存在“快速编写代码”的问题。此外,一个好的numpy实现在速度上总是让我感到惊讶。numpy主要是用C编写的,并且是为速度而设计的,所以这并不奇怪。不过,与python的交互仍然会带来大量开销。Numba或Cython不是您的选择吗?这很可能比任何矢量化方法都要快,而且更容易编码。@max9111“和更容易编码”有争议。;-)这真是太神奇了,但以一种“近乎神奇”的方式……你能在这里发表一些评论来解释它是如何工作的吗?:-)@2炔基完成。希望能有帮助。