Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/328.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 优化嵌套for循环中的计算时间?_Python_Performance_Numpy - Fatal编程技术网

Python 优化嵌套for循环中的计算时间?

Python 优化嵌套for循环中的计算时间?,python,performance,numpy,Python,Performance,Numpy,我有以下代码: import numpy as np from skimage.util import img_as_ubyte from skimage.feature import canny import math image = img_as_ubyte(sf_img) edges = np.flipud(canny(image, sigma=3, low_threshold=10, high_threshold=25)) non_zeros = np.nonzero(edges) t

我有以下代码:

import numpy as np
from skimage.util import img_as_ubyte
from skimage.feature import canny
import math

image = img_as_ubyte(sf_img)
edges = np.flipud(canny(image, sigma=3, low_threshold=10, high_threshold=25))
non_zeros = np.nonzero(edges)
true_rows = non_zeros[0]
true_col = non_zeros[1]
plt.imshow(edges)
plt.show()
N_im = 256
x0 = 0
y0 = -0.25
Npx = 129
Npy = 60
delta_py = 0.025
delta_px = 0.031
Nr = 9
delta_r = 0.5
rho = 0.063
epsilon = 0.75
r_k = np.zeros((1, Nr))
r_min = 0.5

for k in range(0, Nr):
   r_k[0, k] = k * delta_r + r_min

a = np.zeros((Npy, Npx, Nr))

#FOR LOOP TO BE TIME OPTIMIZED
for i in range(0, np.size(true_col, 0)):     #true_col and true_rows has the same size so it doesn't matter
   for m in range(0, Npy):
       for l in range(0, Npx):
           d = math.sqrt(math.pow(
               (((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
                    l * delta_px - (Npx * delta_px / 2) + x0)),
            2) + math.pow(
            (((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
                    m * delta_py - (Npy * delta_py / 2) + y0)),
            2))
           min_idx = np.argmin(np.abs(d - r_k))
           rk_hat = r_k[0, min_idx]
           if np.abs(d - rk_hat) < rho:
               a[m, l, min_idx] = a[m, l, min_idx] + 1

#ANOTHER LOOP TO BE OPTIMIZED
# for m in range(0, Npy):
#     for l in range(0, Npx):                                #ORIGINAL
#         for k in range(0, Nr):
#             if a[m, l, k] < epsilon * np.max(a):
#                 a[m, l, k] = 0

a[np.where(a[:, :, :] < epsilon * np.max(a))] = 0          #SUBSTITUTED

a_prime = np.sum(a, axis=2)

acc_x = np.zeros((Npx, 1))
acc_y = np.zeros((Npy, 1))

for l in range(0, Npx):
   acc_x[l, 0] = l * delta_px - (Npx * delta_px / 2) + x0

for m in range(0, Npy):
   acc_y[m, 0] = m * delta_py - (Npy * delta_py / 2) + y0

prod = 0
for m in range(0, Npy):
   for l in range(0, Npx):
       prod = prod + (np.array([acc_x[l, 0], acc_y[m, 0]]) * a_prime[m, l])

points = prod / np.sum(a_prime)

简单地说,它扫描先前通过Canny边缘检测处理过的256x256图像。 For循环so必须扫描结果图像的每个像素,还必须计算2个嵌套For循环,该循环根据“a”矩阵的l和m索引值执行一些操作

由于边缘检测返回一个带有0和1的图像(与边缘相对应),并且内部操作必须只对一个值的点进行,所以我使用了

非零=np.非零(边)
只获取我感兴趣的索引。事实上,以前的代码是这样的

范围(0,N_im)内的i的
:
对于范围内的j(0,N_im):
如果边[i,j]==1:
对于范围内的m(0,Npy):
对于范围(0,Npx)内的l:
d=math.sqrt(math.pow(
((i-数学楼层((N_im+1)/2))/(N_im+1)/2)-(
l*delta_-px-(Npx*delta_-px/2)+x0)),
2) +math.pow(
((j-数学楼层((N_im+1)/2))/(N_im+1)/2)-(
m*delta_py-(Npy*delta_py/2)+y0),
2))
min_idx=np.argmin(np.abs(d-r_k))
rk_hat=r_k[0,最小idx]
如果np.abs(d-rk_hat)
看起来我成功地优化了前两个循环,但我的脚本需要更快。
运行大约需要6~7分钟,我需要执行大约1000次。你能帮我进一步优化这个脚本的循环吗?谢谢大家!

根据您的脚本,您通常对numpy没有什么经验。Numpy使用SIMD指令进行了优化,而您的代码有点难以实现。我建议您复习一下如何编写numpy代码的基础知识

请检查这张备忘单

例如,此代码可以从

r_k = np.zeros((1, Nr))
for k in range(0, Nr):
   r_k[0, k] = k * delta_r + r_min

### to a simple np.arange assignment
r_k = np.zeros((1, Nr))
r_k[0,:] = np.arange(Nr) * delta_r + r_min

### or you can do everything in one line
r_k = np.expand_dims(np.arange(Nr) * delta_r + r_min,axis=0)
这段代码有点笨拙,因为您在循环遍历每个元素时创建了一个np.array。您可能也可以简化此代码。您是否正在将数据类型从int更改为两个的np.array

prod = 0
for m in range(0, Npy):
   for l in range(0, Npx):
       prod = prod + (np.array([acc_x[l, 0], acc_y[m, 0]]) * a_prime[m, l])
对于这个循环,您可以慢慢地分离出依赖元素和独立元素

#FOR LOOP TO BE TIME OPTIMIZED
for i in range(0, np.size(true_col, 0)):     #true_col and true_rows has the same size so it doesn't matter
   for m in range(0, Npy):
       for l in range(0, Npx):
           d = math.sqrt(math.pow(
               (((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
                    l * delta_px - (Npx * delta_px / 2) + x0)),
            2) + math.pow(
            (((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
                    m * delta_py - (Npy * delta_py / 2) + y0)),
            2))
           min_idx = np.argmin(np.abs(d - r_k))
           rk_hat = r_k[0, min_idx]
           if np.abs(d - rk_hat) < rho:
               a[m, l, min_idx] = a[m, l, min_idx] + 1
为了模拟m和l的行为,您可以按Npy索引矩阵创建Npx。尽管这种模式看起来很奇怪,但Numpy继承了MATLAB生态系统的一些技巧,因为MATLAB/Numpy的目标是简化代码,并允许您花费更多的时间来修复逻辑

## l matrix
[[0,1,2,3,4,5,6,7,8....Npx],
[0,1,2,3,4,5,6,7,8....Npx],
.....
[0,1,2,3,4,5,6,7,8....Npx]]

##m matrix
[[0,0,0,0,0,0,0,0,0,0,0,0],
 [1,1,1,1,,1,1,1,1,1,1,1,1],
  .....
 [Npx,Npx,Npx.....,Npx]]
## You can create both with one command
l_mat, m_mat = np.meshgrid(np.arange(Npx), np.arange(Npy))

>>> l_mat
array([[  0,   1,   2, ..., 147, 148, 149],
       [  0,   1,   2, ..., 147, 148, 149],
       [  0,   1,   2, ..., 147, 148, 149],
       ...,
       [  0,   1,   2, ..., 147, 148, 149],
       [  0,   1,   2, ..., 147, 148, 149],
       [  0,   1,   2, ..., 147, 148, 149]])
>>> m_mat
array([[ 0,  0,  0, ...,  0,  0,  0],
       [ 1,  1,  1, ...,  1,  1,  1],
       [ 2,  2,  2, ...,  2,  2,  2],
       ...,
       [97, 97, 97, ..., 97, 97, 97],
       [98, 98, 98, ..., 98, 98, 98],
       [99, 99, 99, ..., 99, 99, 99]])
使用这两个矩阵,您可以将其相乘以创建结果

d = np.sqrt(np.pow( true_col[i] - np.floor((N_im + 1)/2)) / (N_im + l_mat).....
对于这两行代码,您似乎正在设置一个argmin矩阵

   min_idx = np.argmin(np.abs(d - r_k))
   rk_hat = r_k[0, min_idx]

对于最后两行,d和rk_hat应该是Npy矩阵的Npx。可以使用矩阵切片或np.where创建矩阵遮罩

       if np.abs(d - rk_hat) < rho:
            

       points = np.where( np.abs(d-rk_hat) < rho )

根据您的脚本,您通常对numpy没有什么经验。Numpy使用SIMD指令进行了优化,而您的代码有点难以实现。我建议您复习一下如何编写numpy代码的基础知识

请检查这张备忘单

例如,此代码可以从

r_k = np.zeros((1, Nr))
for k in range(0, Nr):
   r_k[0, k] = k * delta_r + r_min

### to a simple np.arange assignment
r_k = np.zeros((1, Nr))
r_k[0,:] = np.arange(Nr) * delta_r + r_min

### or you can do everything in one line
r_k = np.expand_dims(np.arange(Nr) * delta_r + r_min,axis=0)
这段代码有点笨拙,因为您在循环遍历每个元素时创建了一个np.array。您可能也可以简化此代码。您是否正在将数据类型从int更改为两个的np.array

prod = 0
for m in range(0, Npy):
   for l in range(0, Npx):
       prod = prod + (np.array([acc_x[l, 0], acc_y[m, 0]]) * a_prime[m, l])
对于这个循环,您可以慢慢地分离出依赖元素和独立元素

#FOR LOOP TO BE TIME OPTIMIZED
for i in range(0, np.size(true_col, 0)):     #true_col and true_rows has the same size so it doesn't matter
   for m in range(0, Npy):
       for l in range(0, Npx):
           d = math.sqrt(math.pow(
               (((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
                    l * delta_px - (Npx * delta_px / 2) + x0)),
            2) + math.pow(
            (((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (
                    m * delta_py - (Npy * delta_py / 2) + y0)),
            2))
           min_idx = np.argmin(np.abs(d - r_k))
           rk_hat = r_k[0, min_idx]
           if np.abs(d - rk_hat) < rho:
               a[m, l, min_idx] = a[m, l, min_idx] + 1
为了模拟m和l的行为,您可以按Npy索引矩阵创建Npx。尽管这种模式看起来很奇怪,但Numpy继承了MATLAB生态系统的一些技巧,因为MATLAB/Numpy的目标是简化代码,并允许您花费更多的时间来修复逻辑

## l matrix
[[0,1,2,3,4,5,6,7,8....Npx],
[0,1,2,3,4,5,6,7,8....Npx],
.....
[0,1,2,3,4,5,6,7,8....Npx]]

##m matrix
[[0,0,0,0,0,0,0,0,0,0,0,0],
 [1,1,1,1,,1,1,1,1,1,1,1,1],
  .....
 [Npx,Npx,Npx.....,Npx]]
## You can create both with one command
l_mat, m_mat = np.meshgrid(np.arange(Npx), np.arange(Npy))

>>> l_mat
array([[  0,   1,   2, ..., 147, 148, 149],
       [  0,   1,   2, ..., 147, 148, 149],
       [  0,   1,   2, ..., 147, 148, 149],
       ...,
       [  0,   1,   2, ..., 147, 148, 149],
       [  0,   1,   2, ..., 147, 148, 149],
       [  0,   1,   2, ..., 147, 148, 149]])
>>> m_mat
array([[ 0,  0,  0, ...,  0,  0,  0],
       [ 1,  1,  1, ...,  1,  1,  1],
       [ 2,  2,  2, ...,  2,  2,  2],
       ...,
       [97, 97, 97, ..., 97, 97, 97],
       [98, 98, 98, ..., 98, 98, 98],
       [99, 99, 99, ..., 99, 99, 99]])
使用这两个矩阵,您可以将其相乘以创建结果

d = np.sqrt(np.pow( true_col[i] - np.floor((N_im + 1)/2)) / (N_im + l_mat).....
对于这两行代码,您似乎正在设置一个argmin矩阵

   min_idx = np.argmin(np.abs(d - r_k))
   rk_hat = r_k[0, min_idx]

对于最后两行,d和rk_hat应该是Npy矩阵的Npx。可以使用矩阵切片或np.where创建矩阵遮罩

       if np.abs(d - rk_hat) < rho:
            

       points = np.where( np.abs(d-rk_hat) < rho )

优化嵌套循环的新答案

....
     for i in range(0, np.size(true_col, 0)):     #true_col and true_rows has the same size so it doesn't matter
        for m in range(0, Npy):
            for l in range(0, Npx):
处理时间有了实质性的改进。对于长度为2500的
true\u列
true\u行
在我的机器上大约需要3秒钟。它位于用于测试目的的函数中

def new():
    a = np.zeros((Npy, Npx, Nr),dtype=int)

    # tease out and separate some of the terms
    # used in the calculation of the distance - d
    bb = N_im + 1
    cc = (Npx * delta_px / 2)
    dd = (Npy * delta_py / 2)

    l, m = np.meshgrid(np.arange(Npx), np.arange(Npy))

    q = (true_col - math.floor(bb / 2)) / bb / 2             # shape (true_col length,)
    r = l * delta_px - cc + x0                               # shape(Npy,Npx)
    s = np.square(q - r[...,None])                           # shape(Npy,Npx,true_col length)
                                                             # - last dimension is the outer loop of the original

    t = (true_rows - math.floor(bb / 2)) / bb / 2            # shape (len(true_rows),)
    u = m * delta_py - dd + y0                               # shape(60,129) ... (Npx,Npy)
    v = np.square(t - u[...,None])                           # shape(Npy,Npx,true_col length)

    d = np.sqrt(s + v)                                       # shape(Npy,Npx,true_col length)

    e1 = np.abs(d[...,None] - r_k.squeeze())                 # shape(Npy,Npx,true_col length,len(r_k[0,:]))
    min_idx =  np.argmin(e1,-1)                              # shape(Npy,Npx,true_col length)
    rk_hat = r_k[0,min_idx]                                  # shape(Npy,Npx,true_col length)
    zz = np.abs(d-rk_hat)                                    # shape(Npy,Npx,true_col length)
    condition = zz < rho                                     # shape(Npy,Npx,true_col length)

    # seemingly unavoidable for loop needed to perform 
    # a bincount along the last dimension (filtered)
    # while retaining the 2d position info
    # this will be pretty fast though,
    # nothing really going on other than indexing and assignment
    for iii in range(Npy*Npx):
        row,col = divmod(iii,Npx)
        filter = condition[row,col]
        one_d = min_idx[row,col]
        counts = np.bincount(one_d[filter])
        a[row,col,:counts.size] = counts

    return a

函数中的原始嵌套循环-带有一些诊断添加

def original(writer=None):
    '''writer should be a csv.Writer object.'''

    a = np.zeros((Npy, Npx, Nr),dtype=int)
    for i in range(0, np.size(true_col, 0)):     #true_col and true_rows has the same size so it doesn't matter
        for m in range(0, Npy):
            for l in range(0, Npx):
                d = math.sqrt(math.pow((((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (l * delta_px - (Npx * delta_px / 2) + x0)),2) +
                            math.pow((((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (m * delta_py - (Npy * delta_py / 2) + y0)),2))
                min_idx = np.argmin(np.abs(d - r_k))    # scalar
                rk_hat = r_k[0, min_idx]    # scalar
                if np.abs(d - rk_hat) < rho:
                    # if (m,l) == (0,0):
                    if writer:
                        writer.writerow([i,m,l,d,min_idx,rk_hat,a[m, l, min_idx] + 1])
                    # print(f'condition satisfied: i:{i} a[{m},{l},{min_idx}] = {a[m, l, min_idx]} + 1')
                    a[m, l, min_idx] = a[m, l, min_idx] + 1
    return a
def原件(writer=None):
''writer应该是csv.writer对象''
a=np.zero((Npy,Npx,Nr),dtype=int)
对于范围(0,np.size(true_col,0))中的i:#true_col和true_行具有相同的大小,所以这无关紧要
对于范围内的m(0,Npy):
对于范围(0,Npx)内的l:
d=数学sqrt(数学功率(((真列[i]-数学层((N_im+1)/2))/(N_im+1)/2)-(l*delta_px-(Npx*delta_px/2)+x0)),2)+
数学功率(((真行[i]-数学层((N_im+1)/2))/(N_im+1)/2)-(m*delta_py-(Npy*delta_py/2)+y0)),2))
min_idx=np.argmin(np.abs(d-r_k))#标量
rk_hat=r_k[0,min_idx]#标量
如果np.abs(d-rk_hat)
优化嵌套循环的新答案

....
     for i in range(0, np.size(true_col, 0)):     #true_col and true_rows has the same size so it doesn't matter
        for m in range(0, Npy):
            for l in range(0, Npx):
处理时间有了实质性的改进。对于长度为2500的
true\u列
true\u行
在我的机器上大约需要3秒钟。它位于用于测试目的的函数中

def new():
    a = np.zeros((Npy, Npx, Nr),dtype=int)

    # tease out and separate some of the terms
    # used in the calculation of the distance - d
    bb = N_im + 1
    cc = (Npx * delta_px / 2)
    dd = (Npy * delta_py / 2)

    l, m = np.meshgrid(np.arange(Npx), np.arange(Npy))

    q = (true_col - math.floor(bb / 2)) / bb / 2             # shape (true_col length,)
    r = l * delta_px - cc + x0                               # shape(Npy,Npx)
    s = np.square(q - r[...,None])                           # shape(Npy,Npx,true_col length)
                                                             # - last dimension is the outer loop of the original

    t = (true_rows - math.floor(bb / 2)) / bb / 2            # shape (len(true_rows),)
    u = m * delta_py - dd + y0                               # shape(60,129) ... (Npx,Npy)
    v = np.square(t - u[...,None])                           # shape(Npy,Npx,true_col length)

    d = np.sqrt(s + v)                                       # shape(Npy,Npx,true_col length)

    e1 = np.abs(d[...,None] - r_k.squeeze())                 # shape(Npy,Npx,true_col length,len(r_k[0,:]))
    min_idx =  np.argmin(e1,-1)                              # shape(Npy,Npx,true_col length)
    rk_hat = r_k[0,min_idx]                                  # shape(Npy,Npx,true_col length)
    zz = np.abs(d-rk_hat)                                    # shape(Npy,Npx,true_col length)
    condition = zz < rho                                     # shape(Npy,Npx,true_col length)

    # seemingly unavoidable for loop needed to perform 
    # a bincount along the last dimension (filtered)
    # while retaining the 2d position info
    # this will be pretty fast though,
    # nothing really going on other than indexing and assignment
    for iii in range(Npy*Npx):
        row,col = divmod(iii,Npx)
        filter = condition[row,col]
        one_d = min_idx[row,col]
        counts = np.bincount(one_d[filter])
        a[row,col,:counts.size] = counts

    return a

函数中的原始嵌套循环-带有一些诊断添加

def original(writer=None):
    '''writer should be a csv.Writer object.'''

    a = np.zeros((Npy, Npx, Nr),dtype=int)
    for i in range(0, np.size(true_col, 0)):     #true_col and true_rows has the same size so it doesn't matter
        for m in range(0, Npy):
            for l in range(0, Npx):
                d = math.sqrt(math.pow((((true_col[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (l * delta_px - (Npx * delta_px / 2) + x0)),2) +
                            math.pow((((true_rows[i] - math.floor((N_im + 1) / 2)) / (N_im + 1) / 2) - (m * delta_py - (Npy * delta_py / 2) + y0)),2))
                min_idx = np.argmin(np.abs(d - r_k))    # scalar
                rk_hat = r_k[0, min_idx]    # scalar
                if np.abs(d - rk_hat) < rho:
                    # if (m,l) == (0,0):
                    if writer:
                        writer.writerow([i,m,l,d,min_idx,rk_hat,a[m, l, min_idx] + 1])
                    # print(f'condition satisfied: i:{i} a[{m},{l},{min_idx}] = {a[m, l, min_idx]} + 1')
                    a[m, l, min_idx] = a[m, l, min_idx] + 1
    return a
def origina