替代numpy.argwhere在python中加速for循环_Python_Performance_Numpy

替代numpy.argwhere在python中加速for循环

python performance numpy

替代numpy.argwhere在python中加速for循环,python,performance,numpy,Python,Performance,Numpy,我有两个数据集，如下所示： ds1：作为二维numpy阵列的DEM（数字高程模型）文件 ds2：它显示的区域（像素）中有一些多余的水我有一个while循环，负责根据每个像素的8个相邻像素及其自身的海拔分布（并改变）多余的体积，直到每个像素中的多余体积小于某个值d=0.05。因此，在每次迭代中，我需要找到ds2中多余体积大于0.05的像素索引，如果没有剩余像素，则退出while循环： exit_code == "No" while exit_code == "No": index_of_

我有两个数据集，如下所示：

ds1：作为二维numpy阵列的DEM（数字高程模型）文件

ds2：它显示的区域（像素）中有一些多余的水

我有一个while循环，负责根据每个像素的8个相邻像素及其自身的海拔分布（并改变）多余的体积，直到每个像素中的多余体积小于某个值d=0.05。因此，在每次迭代中，我需要找到ds2中多余体积大于0.05的像素索引，如果没有剩余像素，则退出while循环：

exit_code == "No"
while exit_code == "No":
    index_of_pixels_with_excess_volume = numpy.argwhere(ds2> 0.05) # find location of pixels where excess volume is greater than 0.05

    if not index_of_pixels_with_excess_volume.size:
        exit_code = "Yes"
    else:
        for pixel in index_of_pixels_with_excess_volume:
            # spread those excess volumes to the neighbours and
            # change the values of ds2

问题是numpy.argwhere（ds2>0.05）非常慢。我正在寻找一种更快的替代解决方案

np.其中（arr>0.05）

和

（arr>0.05）.非零（）

在我的测试中大约快22-25%

例如：

while exit_code == "No":
    index_of_pixels_with_excess_volume = numpy.where(ds2 > 0.05)

    if not index_of_pixels_with_excess_volume[0].size:
        exit_code = "Yes"
    else:
        for pixel in zip(*index_of_pixels_with_excess_volume):

但是，我担心

where

与

argwhere

带来的任何收益都会在最后一个循环中由于

zip（*…）

而丢失。如果是这样，请告诉我，我将很高兴地删除此答案。

制作一个二维阵列示例：

In [584]: arr = np.random.rand(1000,1000)

找到其中的一小部分：

In [587]: np.where(arr>.999)
Out[587]: 
(array([  1,   1,   1, ..., 997, 999, 999], dtype=int32),
 array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [588]: _[0].shape
Out[588]: (1034,)

时间不同的

argwhere

：

In [589]: timeit arr>.999
2.65 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [590]: timeit np.count_nonzero(arr>.999)
2.79 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [591]: timeit np.nonzero(arr>.999)
6 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [592]: timeit np.argwhere(arr>.999)
6.06 ms ± 58.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [595]: np.flatnonzero(arr>.999)
Out[595]: array([  1273,   1471,   1584, ..., 997745, 999310, 999679], dtype=int32)
In [596]: timeit np.flatnonzero(arr>.999)
3.05 ms ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [599]: np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
Out[599]: 
(array([  1,   1,   1, ..., 997, 999, 999], dtype=int32),
 array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [600]: timeit np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
3.05 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [601]: timeit np.transpose(np.unravel_index(np.flatnonzero(arr>.999),arr.shap
     ...: e))
3.1 ms ± 5.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [607]: pixels = np.argwhere(arr>.999)
In [608]: timeit [pixel for pixel in pixels]
347 µs ± 5.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

因此，大约1/3的时间花在做

测试上，其余的时间花在查找

True

元素上。将

where

元组转换为2列数组非常快

现在，如果目标只是找到第一个

值，

argmax

很快

In [593]: np.argmax(arr>.999)
Out[593]: 1273    # can unravel this to (1,273)
In [594]: timeit np.argmax(arr>.999)
2.76 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

argmax

短路，因此实际运行时间在找到第一个值时会有所不同

flatnonzero

比

中的快：
In [589]: timeit arr>.999
2.65 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [590]: timeit np.count_nonzero(arr>.999)
2.79 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [591]: timeit np.nonzero(arr>.999)
6 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [592]: timeit np.argwhere(arr>.999)
6.06 ms ± 58.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [595]: np.flatnonzero(arr>.999)
Out[595]: array([  1273,   1471,   1584, ..., 997745, 999310, 999679], dtype=int32)
In [596]: timeit np.flatnonzero(arr>.999)
3.05 ms ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [599]: np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
Out[599]: 
(array([  1,   1,   1, ..., 997, 999, 999], dtype=int32),
 array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [600]: timeit np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
3.05 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [601]: timeit np.transpose(np.unravel_index(np.flatnonzero(arr>.999),arr.shap
     ...: e))
3.1 ms ± 5.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [607]: pixels = np.argwhere(arr>.999)
In [608]: timeit [pixel for pixel in pixels]
347 µs ± 5.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

这与np.argwhere（arr>.999）
相同
有趣的是，flatnonzero
方法将时间缩短了一半！我没想到会有这么大的进步

比较迭代速度：
从argwhere
对二维数组进行迭代：
In [589]: timeit arr>.999
2.65 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [590]: timeit np.count_nonzero(arr>.999)
2.79 ms ± 26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [591]: timeit np.nonzero(arr>.999)
6 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [592]: timeit np.argwhere(arr>.999)
6.06 ms ± 58.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [595]: np.flatnonzero(arr>.999)
Out[595]: array([  1273,   1471,   1584, ..., 997745, 999310, 999679], dtype=int32)
In [596]: timeit np.flatnonzero(arr>.999)
3.05 ms ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [599]: np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
Out[599]: 
(array([  1,   1,   1, ..., 997, 999, 999], dtype=int32),
 array([273, 471, 584, ..., 745, 310, 679], dtype=int32))
In [600]: timeit np.unravel_index(np.flatnonzero(arr>.999),arr.shape)
3.05 ms ± 3.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [601]: timeit np.transpose(np.unravel_index(np.flatnonzero(arr>.999),arr.shap
     ...: e))
3.1 ms ± 5.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [607]: pixels = np.argwhere(arr>.999)
In [608]: timeit [pixel for pixel in pixels]
347 µs ± 5.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

使用zip（*）
transpose从where
迭代元组：
In [609]: idx = np.where(arr>.999)
In [610]: timeit [pixel for pixel in zip(*idx)]
256 µs ± 147 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

在数组上迭代通常比在列表上迭代要慢一些，在本例中是压缩数组
In [611]: [pixel for pixel in pixels][:5]
Out[611]: 
[array([  1, 273], dtype=int32),
 array([  1, 471], dtype=int32),
 array([  1, 584], dtype=int32),
 array([  1, 826], dtype=int32),
 array([  2, 169], dtype=int32)]
In [612]: [pixel for pixel in zip(*idx)][:5]
Out[612]: [(1, 273), (1, 471), (1, 584), (1, 826), (2, 169)]

一个是数组列表，另一个是元组列表。但将这些元组（单独）转换为数组的速度很慢：
在平面非零数组上迭代更快
In [617]: fdx = np.flatnonzero(arr>.999)
In [618]: fdx[:5]
Out[618]: array([1273, 1471, 1584, 1826, 2169], dtype=int32)
In [619]: timeit [i for i in fdx]
112 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

但将分解单独应用于这些值需要时间
def foo(idx):    # a simplified unravel
    return idx//1000, idx%1000

In [628]: timeit [foo(i) for i in fdx]
1.12 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

将此1ms添加到3ms以生成fdx
，此flatnonzero
可能仍在前面。但最好的情况下，我们讨论的是2倍的速度提升。
对于那些可能感兴趣的人来说，我的问题的另一个解决方案是：我发现使用numpy技巧的“代码矢量化”可以通过消除for或while循环和numpy.where（）显著加快运行时间。我发现这两个网站在解释代码矢量化方面非常有用

argwhere
就是where
，通过转置
将数组元组转换为2d数组。您确定这是argwhere
，而不是ds2>0.05
步骤，或者更可能是对所有这些像素的迭代
？我做了一个cProfile，并根据除了argwhere
之外写入的累计秒数得出结论。where
表达式必须在数组上迭代几次。一个用于创建布尔数组。然后np.count\u non-zero
快速计算True
值的数量。最后，where
（实际上是np.nonzero
）收集这些值的索引。将
转置到argwhere
应该是操作的次要部分。如果与真实像素的数量相比，ds2
很大，那么这个操作可能占主导地位。这是一个非常好的主意。让我担心的是，在每次迭代中，需要花费大量时间修改两个数组（ds2和boolean），而不是修改一个数组，从而降低性能？作为一个附带问题：使用scipy稀疏矩阵而不是布尔矩阵如何？如何在2D数组中使用np.flatnonzero（），如果我们对大于零以外的数字的值感兴趣怎么办？@BehzadJamaliflatnonzero
不会直接用于数据，而是用于比较结果（arr>0.05）。对于二维数据，请使用nonzero（）
，这与where（）
@BehzadJamali非常相似。哦，我刚刚重新阅读了你的问题，我看到你提到你在处理二维数组。编辑我的答案…我正在检查解决方案。@BehzadJamali我添加了一个示例，该示例还显示您需要在内部循环中使用zip（*…）
，这可能会扼杀任何性能提升。我承诺在17小时内对此进行升级（达到我当天的限制）（：对于1D情况（扁平阵列）我原以为可以通过预先分配索引数组（如r=np.arange（ds2.size）
）来实现进一步的加速，然后简单地将布尔索引应用于该数组：r[ds2.ravel（）>threshold]
。这在循环中可能会产生一些增益。奇怪的是，这种方法比flatnonzero慢（ds2>基本要求）
。你知道为什么吗？再加上毅力和努力工作，还有比我更好的答案！