Python 矩阵对标量函数的应用_Python_Numpy

Python 矩阵对标量函数的应用

python numpy

Python 矩阵对标量函数的应用,python,numpy,Python,Numpy,假设一个函数M2S，它将一个（小）矩阵作为输入并输出一个标量，并将M2S应用于（大）矩阵的每个块。我希望有一个有效的解决方案与NumPy 我测试的解决方案是： import numpy as np img_depth=np.random.randint(0,5,(480,640)) w_block=10 def M2S(img_block): img_valid=img_block[img_block!=0] if len(img_valid)==0: return None

假设一个函数

M2S

，它将一个（小）矩阵作为输入并输出一个标量，并将

M2S

应用于（大）矩阵的每个块。我希望有一个有效的解决方案与NumPy

我测试的解决方案是：

import numpy as np
img_depth=np.random.randint(0,5,(480,640))
w_block=10

def M2S(img_block):
  img_valid=img_block[img_block!=0]
  if len(img_valid)==0:    return None
  return np.mean(img_valid)
out1=[[uu+w_block/2, vv+w_block/2, M2S(img_depth[vv:vv+w_block,uu:uu+w_block])]
       for vv in range(0,img_depth.shape[0],w_block)
       for uu in range(0,img_depth.shape[1],w_block) ]

（请注意，我序列化了矩阵，以便根据需要列出[块的列、块的行、M2S输出]。）

在我的环境中，此计算耗时43毫秒

我也尝试过矢量化，但速度没有提高：

indices= np.indices(img_depth.shape)[:,0:img_depth.shape[0]:w_block,0:img_depth.shape[1]:w_block]
vM2S= np.vectorize(lambda v,u: M2S(img_depth[v:v+w_block,u:u+w_block]))
out2= np.dstack((indices[1]+w_block/2, indices[0]+w_block/2, vf_feat(indices[0], indices[1]))).reshape(-1,3)

有没有更好的方法来提高计算速度

（编辑）

我画了一个草图来解释上述过程：

第二步（序列化）可以快速计算，因此我的问题主要是关于第一步。

您可以使用

奇特的索引

和

列表理解

技术，如下代码所示：

import numpy as np

img_depth = np.random.randint(0, 5, (5, 5))
w_block = 2

def M2S(img_block):
    img_valid = img_block[img_block != 0]
    if len(img_valid) == 0:
        return None
    return np.mean(img_valid)
]

u = np.arange(0, img_depth.shape[0], w_block)
v = np.arange(0, img_depth.shape[1], w_block)

grid_list = [[x, y] for x in u for y in v]

vals = list(map(M2S, list(map(lambda coords: img_depth[coords[0]:coords[0]+w_block, coords[1]:coords[1]+w_block], grid_list))))

res = [[(data_tup[0][0]+w_block)/2, (data_tup[0][1]+w_block)/2, data_tup[1]] for data_tup in zip(grid_list, vals)]

输出

[[1.0, 1.0, 1.75], [1.0, 2.0, 3.6666666666666665], [1.0, 3.0, 2.0], [2.0, 1.0, 2.6666666666666665], [2.0, 2.0, 3.5], [2.0, 3.0, 2.0], [3.0, 1.0, 2.0], [3.0, 2.0, 2.5], [3.0, 3.0, 2.0]]

这里的主要部分是映射一个lambda，它将图像划分为子图像，然后在每个图像上映射

M2S

函数

然后，最后一行通过

列表理解

干杯。

使用NumPy的

as___________________________________
首先，让我们用一个小矩阵来练习
M=np.array([[ 1,  2,  3,  4],
            [ 5,  6,  7,  8],
            [ 9, 10, 11, 12],
            [13, 14, 15, 16]])
# M.shape:(2,2), M.strides=(16, 4)

让我们考虑一个2x2块，并处理<代码> m <代码>作为（2，2，2，2）张量。在本例中，其步幅为（32,8,16,4）
然后我们对这个张量进行NumPy运算，等价于M2S
。根据上述定义，M2S
是非零元素的平均值：
np.sum(subM,axis=(2,3)) / np.count_nonzero(subM,axis=(2,3))

最后，让我们将此方法推广到解决原始任务
import numpy as np
img_depth=np.random.randint(0,5,(480,640))
w_block=10

view_shape=(int(img_depth.shape[0]/w_block),int(img_depth.shape[1]/w_block),w_block,w_block)
strides=(img_depth.strides[0]*w_block,img_depth.strides[1]*w_block,img_depth.strides[0],img_depth.strides[1])
img_depth2=np.lib.stride_tricks.as_strided(img_depth,view_shape,strides)
out3_0=np.sum(img_depth2,axis=(2,3)) / np.count_nonzero(img_depth2,axis=(2,3))

indices= np.indices(img_depth.shape)[:,0:img_depth.shape[0]:w_block,0:img_depth.shape[1]:w_block]
out3= np.dstack((indices[1]+w_block/2, indices[0]+w_block/2, out3_0)).reshape(-1,3)

这种方法大大提高了计算速度（43ms到1.7ms）

（编辑）
正如@mad Physician所指出的，我们可以用重塑
方法创建张量：
view_shape=(int(img_depth.shape[0]/w_block),w_block,int(img_depth.shape[1]/w_block),w_block)
img_depth2=img_depth.reshape(view_shape).swapaxes(1, 2)

注意这里的view\u-shape
与上面的view\u-shape
不同（索引1和索引2被交换）
这一结果是相同的，并且可能更容易理解，因为我们不需要明确定义步幅。一个问题可能是内存重复，但我没有看到计算时间有多大增加。
你读了多少vectorize
文档？vectorize
没有矢量化…我测试了这一点，发现计算速度与原始代码相同。“列表理解”as_strip
很少是最好的方法，但可能是一个好方法。“重塑+转置会将数据复制到这里，而as_大踏步
则不会。”疯狂物理学家，谢谢！我对重塑方法进行编码，并将其写在答案中。
array([[ 3.5,  5.5],
       [11.5, 13.5]])

import numpy as np
img_depth=np.random.randint(0,5,(480,640))
w_block=10

view_shape=(int(img_depth.shape[0]/w_block),int(img_depth.shape[1]/w_block),w_block,w_block)
strides=(img_depth.strides[0]*w_block,img_depth.strides[1]*w_block,img_depth.strides[0],img_depth.strides[1])
img_depth2=np.lib.stride_tricks.as_strided(img_depth,view_shape,strides)
out3_0=np.sum(img_depth2,axis=(2,3)) / np.count_nonzero(img_depth2,axis=(2,3))

indices= np.indices(img_depth.shape)[:,0:img_depth.shape[0]:w_block,0:img_depth.shape[1]:w_block]
out3= np.dstack((indices[1]+w_block/2, indices[0]+w_block/2, out3_0)).reshape(-1,3)

view_shape=(int(img_depth.shape[0]/w_block),w_block,int(img_depth.shape[1]/w_block),w_block)
img_depth2=img_depth.reshape(view_shape).swapaxes(1, 2)