Python 在for循环中重写阵列的速度性能_Python_Performance_Numpy_Optimization

Python 在for循环中重写阵列的速度性能

python performance numpy optimization

Python 在for循环中重写阵列的速度性能,python,performance,numpy,optimization,Python,Performance,Numpy,Optimization,我有一个带有shape=（500500）的2D数据集。从给定位置（x\u 0，y\u 0）我想将每个元素/像素的距离映射到该给定位置。我通过确定从（x\u 0，y\u 0）的所有唯一距离来实现这一点，并使用整数映射它们。6 x 6数据集的这种映射如下所示： [9 8 7 6 7 8] [8 5 4 3 4 5] [7 4 2 1 2 4] [6 3 1 0 1 3] [7 4 2 1 2 4] [8 5 4 3 4 5] 其中，整数对应于存储在以下数组中的唯一距离： [0. 1. 1.41

我有一个带有

shape=（500500）

的2D数据集。从给定位置

（x\u 0，y\u 0）

我想将每个元素/像素的距离映射到该给定位置。我通过确定从

（x\u 0，y\u 0）

的所有唯一距离来实现这一点，并使用整数映射它们。

6 x 6

数据集的这种映射如下所示：

[9 8 7 6 7 8]
[8 5 4 3 4 5]
[7 4 2 1 2 4]
[6 3 1 0 1 3]
[7 4 2 1 2 4]
[8 5 4 3 4 5]

其中，整数对应于存储在以下数组中的唯一距离：

[0.  1.  1.41421356  2.  2.23606798  2.82842712  3.  3.16227766  3.60555128  4.24264069]

确定这些距离的代码如下所示：

def func(data, (x_0,y_0)):
  y, x = numpy.indices((data.shape))
  r = numpy.sqrt((x - x_0)**2 + (y - y_0)**2)

  float_values = numpy.unique(r.ravel())  # Unique already sorts the result 
  int_values = numpy.arange(float_values.shape[0]).astype(numpy.int) 

  for idx in range(float_values.shape[0])[::-1]:
    r[r == float_values[idx]] = int_values[idx] 

  return float_values, r

for

循环是一个瓶颈。对于我需要的应用程序来说，它花费的时间太长。有没有办法加速/提高其性能？或者是否有一种完全不同但更快的方法来获得我需要的输出

不要摆弄你的“独特距离”阵列。只需预先计算距离，以半径D（平方和）为索引。这很简单

根=[sqrt（float（i））表示范围内的i（上限）]

然后，由于像素是连续的，您可以选择从参考点向外循环，只需将

根的整个适用切片

，从参考点映射到矩阵的边缘

或者，完全退出循环：让

numpy

的矢量化操作为您完成它，例如

dist = np.sqrt(dist_matrix)

不要摆弄你的“独特距离”阵列。只需预先计算距离，以半径D（平方和）为索引。这很简单

根=[sqrt（float（i））表示范围内的i（上限）]

然后，由于像素是连续的，您可以选择从参考点向外循环，只需将

根的整个适用切片

，从参考点映射到矩阵的边缘

或者，完全退出循环：让

numpy

的矢量化操作为您完成它，例如

dist = np.sqrt(dist_matrix)

这是一种使用

掩蔽的矢量化方法

def func_mask_vectorized(data, (x_0, y_0)):
    # Leverage broadcasting with open meshes to create the squared distances/ids
    m,n = data.shape
    Y,X = np.ogrid[:m,:n]
    ids = (X-x_0)**2 + (Y-y_0)**2

    # Setup mask that will help us retrieve the unique "compressed" IDs
    # (similar to what return_inverse does).
    # This is done by setting 1s at ids places and then using that mask to 
    # assign range covered array, in effect setting up the unique compress. IDs.
    mask = np.zeros(ids.max()+1, dtype=bool)
    mask[ids] = 1    
    id_arr = mask.astype(int)
    id_arr[mask] = np.arange(mask.sum())
    r_out = id_arr[ids]

    # Finally extract out the unique ones among the IDs & get their sqrt values
    float_values_out = np.sqrt(np.flatnonzero(mask))
    return float_values_out, r_out

标杆管理使用数据形状

（500500）

，使用问题样本中使用的数字范围

0-9

，对建议设置进行计时，并对以下部分中的所有完整解决方案进行计时-

In [371]: np.random.seed(0)
     ...: data = np.random.randint(0,10,(500,500))
     ...: x_0 = 2
     ...: y_0 = 3

# Original soln
In [372]: %timeit func(data, (x_0,y_0))
1 loop, best of 3: 6.77 s per loop

# @Daniel's soln
In [373]: %timeit func_return_inverse(data, (x_0,y_0))
10 loops, best of 3: 23.9 ms per loop

# Soln from this post
In [374]: %timeit func_mask_vectorized(data, (x_0,y_0))
100 loops, best of 3: 5.02 ms per loop

对于数字可能扩展到

甚至

的情况，扩展不会对这些数字的叠加方式产生太大的变化-

In [397]: np.random.seed(0)
     ...: data = np.random.randint(0,100,(500,500))
     ...: x_0 = 50
     ...: y_0 = 50

In [398]: %timeit func(data, (x_0,y_0))
     ...: %timeit func_return_inverse(data, (x_0,y_0))
     ...: %timeit func_mask_vectorized(data, (x_0,y_0))
1 loop, best of 3: 5.62 s per loop
10 loops, best of 3: 20.7 ms per loop
100 loops, best of 3: 4.28 ms per loop

In [399]: np.random.seed(0)
     ...: data = np.random.randint(0,1000,(500,500))
     ...: x_0 = 500
     ...: y_0 = 500

In [400]: %timeit func(data, (x_0,y_0))
     ...: %timeit func_return_inverse(data, (x_0,y_0))
     ...: %timeit func_mask_vectorized(data, (x_0,y_0))
1 loop, best of 3: 6.87 s per loop
10 loops, best of 3: 21.9 ms per loop
100 loops, best of 3: 5.05 ms per loop

这是一种使用

掩蔽的矢量化方法

def func_mask_vectorized(data, (x_0, y_0)):
    # Leverage broadcasting with open meshes to create the squared distances/ids
    m,n = data.shape
    Y,X = np.ogrid[:m,:n]
    ids = (X-x_0)**2 + (Y-y_0)**2

    # Setup mask that will help us retrieve the unique "compressed" IDs
    # (similar to what return_inverse does).
    # This is done by setting 1s at ids places and then using that mask to 
    # assign range covered array, in effect setting up the unique compress. IDs.
    mask = np.zeros(ids.max()+1, dtype=bool)
    mask[ids] = 1    
    id_arr = mask.astype(int)
    id_arr[mask] = np.arange(mask.sum())
    r_out = id_arr[ids]

    # Finally extract out the unique ones among the IDs & get their sqrt values
    float_values_out = np.sqrt(np.flatnonzero(mask))
    return float_values_out, r_out

标杆管理使用数据形状

（500500）

，使用问题样本中使用的数字范围

0-9

，对建议设置进行计时，并对以下部分中的所有完整解决方案进行计时-

In [371]: np.random.seed(0)
     ...: data = np.random.randint(0,10,(500,500))
     ...: x_0 = 2
     ...: y_0 = 3

# Original soln
In [372]: %timeit func(data, (x_0,y_0))
1 loop, best of 3: 6.77 s per loop

# @Daniel's soln
In [373]: %timeit func_return_inverse(data, (x_0,y_0))
10 loops, best of 3: 23.9 ms per loop

# Soln from this post
In [374]: %timeit func_mask_vectorized(data, (x_0,y_0))
100 loops, best of 3: 5.02 ms per loop

对于数字可能扩展到

甚至

的情况，扩展不会对这些数字的叠加方式产生太大的变化-

In [397]: np.random.seed(0)
     ...: data = np.random.randint(0,100,(500,500))
     ...: x_0 = 50
     ...: y_0 = 50

In [398]: %timeit func(data, (x_0,y_0))
     ...: %timeit func_return_inverse(data, (x_0,y_0))
     ...: %timeit func_mask_vectorized(data, (x_0,y_0))
1 loop, best of 3: 5.62 s per loop
10 loops, best of 3: 20.7 ms per loop
100 loops, best of 3: 4.28 ms per loop

In [399]: np.random.seed(0)
     ...: data = np.random.randint(0,1000,(500,500))
     ...: x_0 = 500
     ...: y_0 = 500

In [400]: %timeit func(data, (x_0,y_0))
     ...: %timeit func_return_inverse(data, (x_0,y_0))
     ...: %timeit func_mask_vectorized(data, (x_0,y_0))
1 loop, best of 3: 6.87 s per loop
10 loops, best of 3: 21.9 ms per loop
100 loops, best of 3: 5.05 ms per loop

使用

unique

的

return\u inverse

-参数：

def func(data, (x_0,y_0)):
    y, x = numpy.indices(data.shape)
    r = (x - x_0)**2 + (y - y_0)**2
    float_values, r = numpy.unique(r, return_inverse=True)
    return float_values ** 0.5, r.reshape(data.shape)

使用

unique

的

return\u inverse

-参数：

def func(data, (x_0,y_0)):
    y, x = numpy.indices(data.shape)
    r = (x - x_0)**2 + (y - y_0)**2
    float_values, r = numpy.unique(r, return_inverse=True)
    return float_values ** 0.5, r.reshape(data.shape)

您的索引方案（数据中的整数）的顺序与距离相同。如果总是这样，则可以生成距离数组，而不需要数据的实际内容

我将这个解决方案建立在索引计算的基础上，它使用每个位置到锚定位置的x和y像素偏移。假设“so”为最小偏移，“ho”为较大偏移，“mo”为任一方向的最大可能偏移：

指数=ho+（mo+1）*lo-lo*（lo+1）//2

为了计算阵列中的距离，我们只需要知道矩阵的维数和锚像素的位置

import numpy as np
def distanceArray(x,y,cols,rows):
    maxDx  = max(x,cols-x)
    maxDy  = max(y,rows-y)
    maxD   = max(maxDx,maxDy)
    minD   = min(maxDx,maxDy)
    lo = np.arange(minD)[:,None]
    hi = np.arange(maxD)
    sqs = lo*lo + hi*hi
    unique = np.tri(*sqs.shape,maxD-minD, dtype=bool)[::-1,::-1]
    return np.sqrt(sqs[unique])

如果我们只关注相对于定位点位置的像素偏移，我们将获得由数据形状的边界（maxDx和maxDy）确定的水平和垂直detla范围

对于距离计算，我们可以忽略垂直/水平方向，创建一个小范围和一个大范围（r）。（来自maxD和minD的lo和hi）

为了计算所有的平方和，我们可以将两个范围中的一个转换为垂直向量（lo），然后将它们的值（hi*hi+lo*lo）平方后将其添加到另一个（hi）。这将生成一个包含所有平方和（SQ）组合的2D矩阵

在这个矩阵中，顶三角形是其对应物的复制品。因此，我们用三角形布尔矩阵来掩盖重复的距离对。（唯一）屏蔽顶部三角形将确保屏蔽操作产生的平方和顺序正确

最后，过滤后的sqs值正好包含我们需要的内容，并且顺序正确。我们只能对最终结果应用代价高昂的平方根函数

不将距离计算应用于每个像素应该会带来一些显著的性能提升，因为这将允许您仅在需要时使用索引距离。我想将此distanceArray函数的性能与其他解决方案进行比较是不公平的（因为它只做了它们所做的一部分），但是，考虑到不必做某些事情也是优化的一部分，最终结果可能会更好（在我的非科学测试中，大约是Divakar的5倍）

请注意，如果仅对一小部分像素使用距离，则可能希望避免所有这些计算，并使用字典作为缓存，根据dX和dY偏移量（键控和有序元组）“按需”计算距离。这将执行绝对最小数量的计算，并仅为任何特定偏移对计算一次距离。您甚至可以继续将该缓存用于其他锚点位置和数据形状，因为无论锚点的位置如何，偏移对都将始终产生相同的距离

[编辑]要获得与我用于distanceArray相同的索引，您可以使用：

def offsets(x,y,cols,rows):
    mo   = max(x,cols-x-1,y,rows-y-1)+1

    dx   = abs(np.arange(cols)-x)
    dy   = abs(np.arange(rows)-y)[:,None]

    mo21 = 2 * mo - 1
    ly = dy*(mo21 - dy )//2  # mo*lo - lo*(lo+1)//2 when dy is lowest
    lx = dx*(mo21 - dx )//2  # mo*lo - lo*(lo+1)//2 when dx is lowest

    return np.maximum(dx,dy) + np.minimum(lx,ly)

offsets(3,3,6,6)

array([[9, 8, 6, 3, 6, 8],
       [8, 7, 5, 2, 5, 7],
       [6, 5, 4, 1, 4, 5],
       [3, 2, 1, 0, 1, 2],
       [6, 5, 4, 1, 4, 5],
       [8, 7, 5, 2, 5, 7]])

您的索引方案（在