Python 区域边界的Numpy检测_Python_Numpy

Python 区域边界的Numpy检测

python numpy

Python 区域边界的Numpy检测,python,numpy,Python,Numpy,给定一个一维值数组： A=[x，…，x，0，…，0，x，…，x，0，…，0，x，…，x，…] 其中： x、 ..，x表示任意数量的任意值及 0，…，0代表任意数量的零我需要找到一个快速算法来找到边界的索引 i、 e.：…，x，0，。。和..，0，x 这个问题似乎有利于并行化，但这超出了我的经验，因为数据太大，在阵列上的简单循环会变慢 THX Martin这至少应该将循环向下推到Numpy原语中，尽管它将遍历数组三次： A = 2*(rand(200000)>0.2) # testin

给定一个一维值数组：

A=[x，…，x，0，…，0，x，…，x，0，…，0，x，…，x，…]

其中：

x、 ..，x表示任意数量的任意值

及

0，…，0代表任意数量的零

我需要找到一个快速算法来找到边界的索引 i、 e.：…，x，0，。。和..，0，x

这个问题似乎有利于并行化，但这超出了我的经验，因为数据太大，在阵列上的简单循环会变慢

THX

Martin

这至少应该将循环向下推到Numpy原语中，尽管它将遍历数组三次：

A = 2*(rand(200000)>0.2)  # testing data
borders = flatnonzero(diff(A==0))

这在我的电脑上需要1.79毫秒。

@chthonicdaemon的答案可以让你完成90%的任务，但是如果你真的想使用索引来分割数组，你需要一些额外的信息

您可能希望使用标记来提取数组中不是0的区域。您已经找到了数组更改的索引，但不知道是从

True

更改为

False

，还是相反。因此，您需要检查第一个和最后一个值并进行相应调整。否则，在某些情况下，您将提取零段而不是数据

例如：

import numpy as np

def contiguous_regions(condition):
    """Finds contiguous True regions of the 1D boolean array "condition".
    Returns a 2D array where the first column is the start index of the region
    and the second column is the end index."""
    # Find the indicies of changes in "condition"
    idx = np.flatnonzero(np.diff(condition)) + 1

    # Prepend or append the start or end indicies to "idx"
    # if there's a block of "True"'s at the start or end...
    if condition[0]:
        idx = np.append(0, idx)
    if condition[-1]:
        idx = np.append(idx, len(condition))

    return idx.reshape(-1, 2)

# Generate an example dataset...
t = np.linspace(0, 4*np.pi, 20)
x = np.abs(np.sin(t)) + 0.1
x[np.sin(t) < 0.5] = 0

print x

# Get the contiguous regions where x is not 0
for start, stop in contiguous_regions(x != 0):
    print x[start:stop]

通过这样做：

for start, stop in contiguous_regions(x != 0):
    print x[start:stop]

我们将得到：

[ 0.71421271  1.06940027  1.01577333]
[ 0.93716648  1.09658449  0.83572391]

到目前为止你尝试了什么？您得到了什么错误？您的阵列有多大？零是否比x多得多，x是否比零多得多，或者两者的数量大致相同？这些边框（相对于大小）是很多还是很少？@RubenBermudez没有错误，只是长了一个calculation@chthonicdaemon长度约为200000个值。零的垃圾在一点随机音符上有一个已知的最大长度，这个问题实际上根本不适合并行化。瓶颈将是内存访问，而不是CPU速度。几乎没有要进行预成型的计算，但需要进行大量的计算。如果你很小心的话，你应该能够通过并行方法获得加速，但这并不像你乍一看所想的那么简单。使用我的测试数据——我必须消化代码+1中发生的事情，但实际上你目前正在做的事情有一个微妙的错误。如果OP的数组以一大块零开始或结束，则需要在

边框

前加0/-1。当然，OP只是询问了这些变化的迹象，这就完成了。但是，要实际使用结果，您需要知道更改的方向。

[ 0.71421271  1.06940027  1.01577333]
[ 0.93716648  1.09658449  0.83572391]