Performance 将循环的像素(x,y)转换为numpy.nditer迭代器

Performance 将循环的像素(x,y)转换为numpy.nditer迭代器,performance,numpy,image-processing,iterator,pygame,Performance,Numpy,Image Processing,Iterator,Pygame,我正在尝试加速一些pygame图像处理代码,它迭代每个像素并修改它们。我正在研究numpynditer函数,但我正在努力研究如何实现它 # Iterate though main image for x, row in enumerate(main): for y, pix1 in enumerate(row): # Check the pixel isn't too dark to worry about if

我正在尝试加速一些pygame图像处理代码,它迭代每个像素并修改它们。我正在研究numpynditer函数,但我正在努力研究如何实现它

    # Iterate though main image
    for x, row in enumerate(main):
        for y, pix1 in enumerate(row):

            # Check the pixel isn't too dark to worry about
            if pix1[0] + pix1[1] + pix1[2] > 10:

                # Calculate distance to light source
                light_distance = np.hypot( x - light_source_pos[0], y - light_source_pos[1] )

                # Calculate light intensity
                light_intensity = (300 - light_distance) / 300

                # Apply light color and intensity to the specular map, apply specular gain then add to main
                main[x][y] += light_color * light_intensity * specular[x][y] * specular_gain

                # Apply light color and intensity to the diffuse map, apply diffuse gain then add to main
                main[x][y] += light_color * light_intensity * diffuse[x][y] * diffuse_gain
我正在迭代由
pygame.surfarray.pixels3d()生成的图像数据数组[x][y][r][g][b]。数组不是副本,而是对实际内存内容的引用

我如何创建一个遍历x和y坐标并尽快应用更改的迭代器

据我所知,按内存顺序操作像素会更快,并将所有内容都保留在迭代器循环中


编辑:上面的代码片段是为了更容易理解,但是整个脚本都注册了。要运行它,您需要使用一些源映像。

查看代码,该实现显然是可并行的,因此我们可以使用矢量化实现。现在,为了消除循环,我们需要在某些地方扩展输入的维度,这将发挥作用

为了便于代码查找和维护,我假设使用这些缩写-

S = specular
D = diffuse
LSP = light_source_pos
LC = light_color
S_gain = specular_gain
D_gain = diffuse_gain
这里有一种将问题矢量化的方法-

# Vectorize light_distance calculations and thereafter for light_intensity
LD = (np.hypot(np.arange(M)[:,None] - LSP[0], np.arange(N) - LSP[1]))
LI = (300 - LD) / 300

# Vectorized "LC * light_intensity * S[x][y] * S_gain" and 
# "LC * light_intensity * D[x][y] * D_gain" calculations
add_part = (LC*LI[...,None]*S*S_gain) + (LC*LI[...,None]*D*D_gain)

# Get masked places set by "pix1[0] + pix1[1] + pix1[2] > 10", which would be 
# "main.sum(2) > 10". Use mask to add selective elements from add_part into main 
main += (add_part*(main.sum(2)[...,None] > 10))

运行时测试和验证输出

定义函数-

def original_app(main,S,D,LSP,LC,S_gain,D_gain):
    for x, row in enumerate(main):
        for y, pix1 in enumerate(row):
            if pix1[0] + pix1[1] + pix1[2] > 10:
                light_distance = np.hypot( x - LSP[0], y - LSP[1] )
                light_intensity = (300 - light_distance) / 300
                main[x][y] += LC * light_intensity * S[x][y] * S_gain
                main[x][y] += LC * light_intensity * D[x][y] * D_gain


def vectorized_app(main,S,D,LSP,LC,S_gain,D_gain):
    LD = (np.hypot(np.arange(M)[:,None] - LSP[0], np.arange(N) - LSP[1]))
    LI = (300 - LD) / 300
    add_part = (LC*LI[...,None]*S*S_gain) + (LC*LI[...,None]*D*D_gain)
    main += (add_part*(main.sum(2)[...,None] > 10))
运行时-

In [38]: # Inputs
    ...: M,N,R = 300,200,3 # Shape as stated in the comments
    ...: main = np.random.rand(M,N,R)*10
    ...: S = np.random.rand(M,N,R)
    ...: D = np.random.rand(M,N,R)
    ...: LSP = [3,10]
    ...: LC = np.array([2,6,3])
    ...: S_gain = 0.45
    ...: D_gain = 0.22
    ...: 
    ...: # Make copies as functions would change those
    ...: mainc1 = main.copy()
    ...: mainc2 = main.copy()
    ...: 

In [39]: original_app(mainc1,S,D,LSP,LC,S_gain,D_gain)

In [40]: vectorized_app(mainc2,S,D,LSP,LC,S_gain,D_gain)

In [41]: np.allclose(mainc1,mainc2) # Verify outputs
Out[41]: True

In [42]: # Make copies again for timing as functions would change those
    ...: mainc1 = main.copy()
    ...: mainc2 = main.copy()
    ...: 

In [43]: %timeit original_app(mainc1,S,D,LSP,LC,S_gain,D_gain)
1 loops, best of 3: 1.28 s per loop

In [44]: %timeit vectorized_app(mainc2,S,D,LSP,LC,S_gain,D_gain)
100 loops, best of 3: 15.4 ms per loop

In [45]: 1280/15.4 # Speedup
Out[45]: 83.11688311688312

你能列出所涉及的输入的形状吗?@Divakar它根据图像大小而变化,但我们现在可以假设它是300x200图像。这意味着主阵列、漫反射阵列和镜面反射阵列与此阵列类似[0:299]=>array[0:199]=>array[0:2]。我先遍历第一个数组,然后遍历第二个数组(x和y),然后对最终的像素颜色数组进行操作,
nditer
不会加快迭代。如果您计划以后在Cython代码中使用Python版本,那么Python版本非常有用。在这里没有帮助。使用
[x,y]
样式索引。@hpaulj我以为它消除了迭代调用之间的python函数调用开销?或者我很困惑,它真的需要Cython来改变吗?我现在只是想让它与我的实现一起工作。你能调整你的答案,使LC是一个数组[0:2],并使S和D的形状与main相同吗?@Oliver so,你是说main,S和D是:(M,N,3)形状;LSP为(2,)形;LC为(3,)型;S_增益,D_增益是标量吗?查看编辑?太棒了。他很有魅力。非常感谢你的帮助。你对学习广播有什么建议吗?我开始看numpy文档,但它确实很繁重,似乎要求我阅读并理解numpy文档的其余部分。@Oliver,如文章中所链接,请按照广播文档进行操作。另外,我个人喜欢尽可能不使用循环,无论是使用MATLAB还是NumPy。所以,我必须回答说,只要继续练习,看看更多这样的广播和/或基于问答。或者在这里问一些关于理解广播的问题,祝你好运!