Python 并行化pytorch中的嵌套for循环
我正在使用基于pytorch的代码。我的部分代码有4个嵌套for循环。基本上,它是寻找图像的补丁和估计两个补丁之间的相似性。因为参数是torch元素,所以像joblib这样的python库都不能工作。我是pycuda的新手,我很乐意寻求帮助来并行化这段代码。目前,这是一个耗时超过1.5秒的缓慢计算。这是我的部分代码Python 并行化pytorch中的嵌套for循环,python,parallel-processing,pytorch,Python,Parallel Processing,Pytorch,我正在使用基于pytorch的代码。我的部分代码有4个嵌套for循环。基本上,它是寻找图像的补丁和估计两个补丁之间的相似性。因为参数是torch元素,所以像joblib这样的python库都不能工作。我是pycuda的新手,我很乐意寻求帮助来并行化这段代码。目前,这是一个耗时超过1.5秒的缓慢计算。这是我的部分代码 import torch import numpy as np import cv2 import time def mul_val(a,b,ax=None): retu
import torch
import numpy as np
import cv2
import time
def mul_val(a,b,ax=None):
return torch.mean(((b-a)/(b+0.01))**2)
def np_to_torch(img_np):
return torch.from_numpy(img_np)[None,None, :]
def main_fn(a,alpha = 0.25):
start = time.time()
final_u = torch.zeros(w,h)
frame1 = a
for y1 in range(w):
i = y1*grid_size
for x1 in range(h):
j = x1*grid_size ## batch, channel, width, height
block1 = frame1[:,:,i:i+grid_size, j:j+grid_size]
corr_list = []
for y2 in range(y1-radius,y1+radius+1):
i2 = y2*grid_size
if not (0 <= y2 < h):
continue
for x2 in range(x1-radius,x1+radius+1):
j2 = x2*grid_size
if not (0 <= x2 < w):
continue
block2 = frame1[:,:,i2:i2+grid_size, j2:j2+grid_size]
if not (block1.shape == block2.shape):
continue
corr = mul_val(block1, block2)
corr_list.append(corr)
corr_list=sorted(corr_list, reverse = True)
del corr_list[20:]
uncorr = 1.0 - (sum(corr_list)/20.0)
final_u[y1,x1] = torch.mul(a[:,:,i,j],uncorr)
new_val = torch.norm(torch.sub(1.0,final_u))
final_value = alpha*new_val
print("Time taken is ..", time.time()-start)
return final_value
grid_size = 9
radius = 3
im1 = cv2.imread("1.png", 0)
img = np.asarray(im1)
img = img.astype(np.float64)
img_torch = np_to_torch(img) ### made this as a torch element intentionally.
frame_width = img.shape[0]
frame_height = img.shape[1]
h = int(frame_height//grid_size)
w = int(frame_width//grid_size)
main_fn(img_torch)
导入火炬
将numpy作为np导入
进口cv2
导入时间
def mul_val(a、b、ax=无):
返回火炬。平均值(((b-a)/(b+0.01))**2)
def np到火炬(img np):
返回火炬。从\u numpy(img\u np)[无,无,:]
def main_fn(a,α=0.25):
开始=时间。时间()
最终_=火炬零点(w,h)
框架1=a
对于范围(w)内的y1:
i=y1*网格尺寸
对于范围(h)内的x1:
j=x1*网格尺寸##批次、通道、宽度、高度
block1=frame1[:,:,i:i+网格大小,j:j+网格大小]
corr_list=[]
对于范围内的y2(y1半径,y1+半径+1):
i2=y2*网格尺寸
若否(0),