Python 加速numpy.where用于提取整数段？_Python_Performance_Numpy

Python 加速numpy.where用于提取整数段？

python performance numpy

Python 加速numpy.where用于提取整数段？,python,performance,numpy,Python,Performance,Numpy,我正在努力研究如何加速使用numpy的Python函数。我从中收到的输出如下，这表明绝大多数时间都花在行ind\u y，ind\u x=np上。其中（seg\u image==I） seg_image是一个整数数组，它是分割图像的结果，从而查找seg_image==i提取特定分割对象的像素。我循环了很多这样的对象（在下面的代码中，我只是循环了5个进行测试，但实际上我将循环了20000多个对象），这需要很长时间才能运行有什么方法可以加速np.where呼叫？或者，倒数第二行（也占了很好的时间比例

我正在努力研究如何加速使用numpy的Python函数。我从中收到的输出如下，这表明绝大多数时间都花在行

ind\u y，ind\u x=np上。其中（seg\u image==I）

seg_image

是一个整数数组，它是分割图像的结果，从而查找

seg_image==i

提取特定分割对象的像素。我循环了很多这样的对象（在下面的代码中，我只是循环了5个进行测试，但实际上我将循环了20000多个对象），这需要很长时间才能运行

有什么方法可以加速

np.where

呼叫？或者，倒数第二行（也占了很好的时间比例）可以加速

理想的解决方案是立即在整个数组上运行代码，而不是循环，但我认为这是不可能的，因为我需要运行的某些函数会产生副作用（例如，扩展分段对象可能会使其与下一个区域“碰撞”，从而在以后给出错误的结果）

有人有什么想法吗

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def correct_hot(hot_image, seg_image):
     6         1       239810 239810.0      2.3      new_hot = hot_image.copy()
     7         1       572966 572966.0      5.5      sign = np.zeros_like(hot_image) + 1
     8         1        67565  67565.0      0.6      sign[:,:] = 1
     9         1      1257867 1257867.0     12.1      sign[hot_image > 0] = -1
    10                                           
    11         1          150    150.0      0.0      s_elem = np.ones((3, 3))
    12                                           
    13                                               #for i in xrange(1,seg_image.max()+1):
    14         6           57      9.5      0.0      for i in range(1,6):
    15         5      6092775 1218555.0     58.5          ind_y, ind_x = np.where(seg_image == i)
    16                                           
    17                                                   # Get the average HOT value of the object (really simple!)
    18         5         2408    481.6      0.0          obj_avg = hot_image[ind_y, ind_x].mean()
    19                                           
    20         5          333     66.6      0.0          miny = np.min(ind_y)
    21                                                   
    22         5          162     32.4      0.0          minx = np.min(ind_x)
    23                                                   
    24                                           
    25         5          369     73.8      0.0          new_ind_x = ind_x - minx + 3
    26         5          113     22.6      0.0          new_ind_y = ind_y - miny + 3
    27                                           
    28         5          211     42.2      0.0          maxy = np.max(new_ind_y)
    29         5          143     28.6      0.0          maxx = np.max(new_ind_x)
    30                                           
    31                                                   # 7 is + 1 to deal with the zero-based indexing, + 2 * 3 to deal with the 3 cell padding above
    32         5          217     43.4      0.0          obj = np.zeros( (maxy+7, maxx+7) )
    33                                           
    34         5          158     31.6      0.0          obj[new_ind_y, new_ind_x] = 1
    35                                           
    36         5         2482    496.4      0.0          dilated = ndimage.binary_dilation(obj, s_elem)
    37         5         1370    274.0      0.0          border = mahotas.borders(dilated)
    38                                           
    39         5          122     24.4      0.0          border = np.logical_and(border, dilated)
    40                                           
    41         5          355     71.0      0.0          border_ind_y, border_ind_x = np.where(border == 1)
    42         5          136     27.2      0.0          border_ind_y = border_ind_y + miny - 3
    43         5          123     24.6      0.0          border_ind_x = border_ind_x + minx - 3
    44                                           
    45         5          645    129.0      0.0          border_avg = hot_image[border_ind_y, border_ind_x].mean()
    46                                           
    47         5      2167729 433545.8     20.8          new_hot[seg_image == i] = (new_hot[ind_y, ind_x] + (sign[ind_y, ind_x] * np.abs(obj_avg - border_avg)))
    48         5        10179   2035.8      0.1          print obj_avg, border_avg
    49                                           
    50         1            4      4.0      0.0      return new_hot

为了节省一点时间，可以做的一件事是保存

seg_image==i

的结果，这样就不需要计算两次。在第15行和第47行进行计算时，可以添加

seg_mask=seg_image==i

，然后重用该结果（为了分析的目的，最好将该片段分离出来）

虽然您还可以做一些其他小事情来提高性能，但根本问题是您使用的是O（M*N）算法，其中M是分段数，N是图像大小。从你的代码中，我不清楚是否有更快的算法来完成同样的事情，但这是我第一次尝试寻找加速。

编辑我在底部留下了我的原始答案，以供记录，但实际上我在午餐时更详细地研究了你的代码，我认为使用

np.where

是一个很大的错误：

In [63]: a = np.random.randint(100, size=(1000, 1000))

In [64]: %timeit a == 42
1000 loops, best of 3: 950 us per loop

In [65]: %timeit np.where(a == 42)
100 loops, best of 3: 7.55 ms per loop

您可以在获得点的实际坐标所需的1/8时间内获得一个布尔数组（可用于索引）

当然，您可以对功能进行裁剪，但是

ndimage

有一个

find_objects

函数，该函数返回封闭切片，并且速度非常快：

In [66]: %timeit ndimage.find_objects(a)
100 loops, best of 3: 11.5 ms per loop

这将返回一个包含所有对象的切片元组列表，查找单个对象的索引所需的时间比查找单个对象的索引多50%

它可能无法开箱即用，因为我现在无法对其进行测试，但我会将您的代码重组为以下内容：

def correct_hot_bis(hot_image, seg_image):
    # Need this to not index out of bounds when computing border_avg
    hot_image_padded = np.pad(hot_image, 3, mode='constant',
                              constant_values=0)
    new_hot = hot_image.copy()
    sign = np.ones_like(hot_image, dtype=np.int8)
    sign[hot_image > 0] = -1
    s_elem = np.ones((3, 3))

    for j, slice_ in enumerate(ndimage.find_objects(seg_image)):
        hot_image_view = hot_image[slice_]
        seg_image_view = seg_image[slice_]
        new_shape = tuple(dim+6 for dim in hot_image_view.shape)
        new_slice = tuple(slice(dim.start,
                                dim.stop+6,
                                None) for dim in slice_)
        indices = seg_image_view == j+1

        obj_avg = hot_image_view[indices].mean()

        obj = np.zeros(new_shape)
        obj[3:-3, 3:-3][indices] = True

        dilated = ndimage.binary_dilation(obj, s_elem)
        border = mahotas.borders(dilated)
        border &= dilated

        border_avg = hot_image_padded[new_slice][border == 1].mean()

        new_hot[slice_][indices] += (sign[slice_][indices] *
                                     np.abs(obj_avg - border_avg))

    return new_hot

您仍然需要计算碰撞，但通过使用

np同时计算所有索引，您可以获得大约2倍的速度。基于的独特方法：
a = np.random.randint(100, size=(1000, 1000))

def get_pos(arr):
    pos = []
    for j in xrange(100):
        pos.append(np.where(arr == j))
    return pos

def get_pos_bis(arr):
    unq, flat_idx = np.unique(arr, return_inverse=True)
    pos = np.argsort(flat_idx)
    counts = np.bincount(flat_idx)
    cum_counts = np.cumsum(counts)
    multi_dim_idx = np.unravel_index(pos, arr.shape)
    return zip(*(np.split(coords, cum_counts) for coords in multi_dim_idx))

In [33]: %timeit get_pos(a)
1 loops, best of 3: 766 ms per loop

In [34]: %timeit get_pos_bis(a)
1 loops, best of 3: 388 ms per loop

请注意，每个对象的像素以不同的顺序返回，因此不能简单地比较两个函数的返回来评估相等性。但是他们都应该返回相同的结果。
这太棒了，太棒了，太棒了-谢谢！第一次运行它时，我发现它实际上比我原来的代码慢，但后来我修改了一些代码，使它在一个小数组中而不是在一个大数组中完成所有工作（膨胀、边界等）——通过修改新的_形状的计算方式。我现在的速度有了很大的提高。在我正在处理的一张图片上，旧版本花了两个半小时，新版本花了11秒！哎呀！是的，看起来生成器表达式应该是new\u shape=tuple（dim+6表示热\u image\u view.shape中的dim）
，而不是new\u shape=tuple（dim+6表示热\u image.shape中的dim）
。这就是你改变的吗？请随意编辑我的答案以反映工作代码。