Performance 将循环的像素（x，y）转换为numpy.nditer迭代器_Performance_Numpy_Image Processing_Iterator_Pygame

Performance 将循环的像素（x，y）转换为numpy.nditer迭代器

performance numpy image-processing

Performance 将循环的像素（x，y）转换为numpy.nditer迭代器,performance,numpy,image-processing,iterator,pygame,Performance,Numpy,Image Processing,Iterator,Pygame,我正在尝试加速一些pygame图像处理代码，它迭代每个像素并修改它们。我正在研究numpynditer函数，但我正在努力研究如何实现它 # Iterate though main image for x, row in enumerate(main): for y, pix1 in enumerate(row): # Check the pixel isn't too dark to worry about if

我正在尝试加速一些pygame图像处理代码，它迭代每个像素并修改它们。我正在研究numpynditer函数，但我正在努力研究如何实现它

    # Iterate though main image
    for x, row in enumerate(main):
        for y, pix1 in enumerate(row):

            # Check the pixel isn't too dark to worry about
            if pix1[0] + pix1[1] + pix1[2] > 10:

                # Calculate distance to light source
                light_distance = np.hypot( x - light_source_pos[0], y - light_source_pos[1] )

                # Calculate light intensity
                light_intensity = (300 - light_distance) / 300

                # Apply light color and intensity to the specular map, apply specular gain then add to main
                main[x][y] += light_color * light_intensity * specular[x][y] * specular_gain

                # Apply light color and intensity to the diffuse map, apply diffuse gain then add to main
                main[x][y] += light_color * light_intensity * diffuse[x][y] * diffuse_gain

我正在迭代由

pygame.surfarray.pixels3d（）生成的图像数据数组[x][y][r][g][b]。数组不是副本，而是对实际内存内容的引用
我如何创建一个遍历x和y坐标并尽快应用更改的迭代器
据我所知，按内存顺序操作像素会更快，并将所有内容都保留在迭代器循环中
编辑：上面的代码片段是为了更容易理解，但是整个脚本都注册了。要运行它，您需要使用一些源映像。
查看代码，该实现显然是可并行的，因此我们可以使用矢量化实现。现在，为了消除循环，我们需要在某些地方扩展输入的维度，这将发挥作用
为了便于代码查找和维护，我假设使用这些缩写-
S = specular
D = diffuse
LSP = light_source_pos
LC = light_color
S_gain = specular_gain
D_gain = diffuse_gain

这里有一种将问题矢量化的方法-
# Vectorize light_distance calculations and thereafter for light_intensity
LD = (np.hypot(np.arange(M)[:,None] - LSP[0], np.arange(N) - LSP[1]))
LI = (300 - LD) / 300

# Vectorized "LC * light_intensity * S[x][y] * S_gain" and 
# "LC * light_intensity * D[x][y] * D_gain" calculations
add_part = (LC*LI[...,None]*S*S_gain) + (LC*LI[...,None]*D*D_gain)

# Get masked places set by "pix1[0] + pix1[1] + pix1[2] > 10", which would be 
# "main.sum(2) > 10". Use mask to add selective elements from add_part into main 
main += (add_part*(main.sum(2)[...,None] > 10))


运行时测试和验证输出
定义函数-
def original_app(main,S,D,LSP,LC,S_gain,D_gain):
    for x, row in enumerate(main):
        for y, pix1 in enumerate(row):
            if pix1[0] + pix1[1] + pix1[2] > 10:
                light_distance = np.hypot( x - LSP[0], y - LSP[1] )
                light_intensity = (300 - light_distance) / 300
                main[x][y] += LC * light_intensity * S[x][y] * S_gain
                main[x][y] += LC * light_intensity * D[x][y] * D_gain


def vectorized_app(main,S,D,LSP,LC,S_gain,D_gain):
    LD = (np.hypot(np.arange(M)[:,None] - LSP[0], np.arange(N) - LSP[1]))
    LI = (300 - LD) / 300
    add_part = (LC*LI[...,None]*S*S_gain) + (LC*LI[...,None]*D*D_gain)
    main += (add_part*(main.sum(2)[...,None] > 10))

运行时-
In [38]: # Inputs
    ...: M,N,R = 300,200,3 # Shape as stated in the comments
    ...: main = np.random.rand(M,N,R)*10
    ...: S = np.random.rand(M,N,R)
    ...: D = np.random.rand(M,N,R)
    ...: LSP = [3,10]
    ...: LC = np.array([2,6,3])
    ...: S_gain = 0.45
    ...: D_gain = 0.22
    ...: 
    ...: # Make copies as functions would change those
    ...: mainc1 = main.copy()
    ...: mainc2 = main.copy()
    ...: 

In [39]: original_app(mainc1,S,D,LSP,LC,S_gain,D_gain)

In [40]: vectorized_app(mainc2,S,D,LSP,LC,S_gain,D_gain)

In [41]: np.allclose(mainc1,mainc2) # Verify outputs
Out[41]: True

In [42]: # Make copies again for timing as functions would change those
    ...: mainc1 = main.copy()
    ...: mainc2 = main.copy()
    ...: 

In [43]: %timeit original_app(mainc1,S,D,LSP,LC,S_gain,D_gain)
1 loops, best of 3: 1.28 s per loop

In [44]: %timeit vectorized_app(mainc2,S,D,LSP,LC,S_gain,D_gain)
100 loops, best of 3: 15.4 ms per loop

In [45]: 1280/15.4 # Speedup
Out[45]: 83.11688311688312

你能列出所涉及的输入的形状吗？@Divakar它根据图像大小而变化，但我们现在可以假设它是300x200图像。这意味着主阵列、漫反射阵列和镜面反射阵列与此阵列类似[0:299]=>array[0:199]=>array[0:2]。我先遍历第一个数组，然后遍历第二个数组（x和y），然后对最终的像素颜色数组进行操作，nditer
不会加快迭代。如果您计划以后在Cython代码中使用Python版本，那么Python版本非常有用。在这里没有帮助。使用[x，y]
样式索引。@hpaulj我以为它消除了迭代调用之间的python函数调用开销？或者我很困惑，它真的需要Cython来改变吗？我现在只是想让它与我的实现一起工作。你能调整你的答案，使LC是一个数组[0:2]，并使S和D的形状与main相同吗？@Oliver so，你是说main，S和D是：（M，N，3）形状；LSP为（2，）形；LC为（3，）型；S_增益，D_增益是标量吗？查看编辑？太棒了。他很有魅力。非常感谢你的帮助。你对学习广播有什么建议吗？我开始看numpy文档，但它确实很繁重，似乎要求我阅读并理解numpy文档的其余部分。@Oliver，如文章中所链接，请按照广播文档进行操作。另外，我个人喜欢尽可能不使用循环，无论是使用MATLAB还是NumPy。所以，我必须回答说，只要继续练习，看看更多这样的广播和/或基于问答。或者在这里问一些关于理解广播的问题，祝你好运！