如何将两个矩阵的点积的cpu代码转换为matlab中的GPU

如何将两个矩阵的点积的cpu代码转换为matlab中的GPU,matlab,neural-network,gpu,matrix-multiplication,Matlab,Neural Network,Gpu,Matrix Multiplication,我想让GPUarray中两个矩阵的加权和更快。例如,我在cpu上的代码如下所示: mat1 = rand(19,19); mat2= rand(19,19); Receptive_fieldsize = [4,3]; overlap = 1; Output = GetweightedSum(mat1,mat2, Receptive_fieldsize,overlap); %this will output in an 6x6 matrix 其中,作为我的职能机构: function

我想让GPUarray中两个矩阵的加权和更快。例如,我在cpu上的代码如下所示:

mat1 = rand(19,19);

mat2= rand(19,19);

Receptive_fieldsize = [4,3]; 

overlap = 1;

Output = GetweightedSum(mat1,mat2, Receptive_fieldsize,overlap); %this will output in an 6x6 matrix
其中,作为我的职能机构:

function Output = GetweightedSum(mat1,mat2, RF,overlap)

gap = RF(1) - overlap;
size_mat = size(mat1);
output_size=[6,6];
for u=1: output_size(1)
    for v=1: output_size(2)
        min_u = (u - 1) * gap + 1;
        max_u = (u - 1) * gap + RF(1);
        min_v = (v - 1) * gap + 1;
        max_v = (v - 1) * gap + RF(2);

       input1 = mat1(min_u:max_u,min_v:max_v);
       input2 = mat2(min_u:max_u,min_v:max_v); 
       Output(u,v) = sum(sum(input1 .*input2));

   end
end
如何将其转换为GPUfunciton。我可以直接这样做,或者在GPU代码中使用for循环。我对GPU一无所知,所以我对GPU一无所知。 如果有人指导我,或者更改上面的代码作为GPU函数的参考,我会很感激,这样我就可以从中学习。
关于

查看代码及其旁边的注释是否对您有意义-

function Output = GetweightedSumGPU(mat1,mat2, RF,overlap)

%// Create parameters
gap = RF(1) - overlap;
output_size=[6,6];
sz1 = output_size(1);
sz2 = output_size(2);

nrows = size(mat1,1); %// get number of rows in mat1

%// Copy data to GPU
gmat1 = gpuArray(mat1);
gmat2 = gpuArray(mat2);

start_row_ind = gpuArray([1:RF(1)]'); %//' starting row indices for each block
col_offset = gpuArray([0:RF(2)-1]*nrows); %// column offset for each block

%// Linear indices for each block
ind = bsxfun(@plus,start_row_ind,col_offset);

%// Linear indices along rows and columns respectively
ind_rows = bsxfun(@plus,ind(:),[0:sz1-1]*gap);
ind_rows_cols = bsxfun(@plus,ind_rows,permute([0:sz2-1]*gap*nrows,[1 3 2]));

%// Elementwise multiplication, summing and gathering back result to CPU
Output = gather(reshape(sum(gmat1(ind_rows_cols).*gmat2(ind_rows_cols),1),sz1,sz2));

return;

@迪瓦卡纠正了这一点。现在不要这样,让我试试。过一会儿就会来找你while@khan我用GPU在我的系统上进行了测试,似乎没有战胜CPU代码,但我想这对你来说可能是一次学习的经历。我认为问题是你没有让GPU做足够的工作。我同意,gpuendtime=0.0083 CPUtimetaken=0.0578与我的结果是这样的…但是的,我正试图从它开始。有越来越多的图片,你不认为如果我有1000张图片,这个0.05会影响我的最终结果吗?或者我应该重新排列我的代码,顺便说一句,非常感谢。@khan那是
mat1
mat2
中的图像数据吗?其中一大开销是将数据复制到GPU。如果是这样的话,我想你只需要复印一次?另外,在调用GPU函数并将gpuArray数据输入GPU函数之前,尝试通过复制进行基准测试。@khan yes。即do-
gmat1=gpuArray(mat1);gmat2=gpuArray(mat2)
然后像这样调用函数-
OutputGPU=GetweightedSumGPU(gmat1,gmat2,Receptive_fieldsize,overlap)。另外,注释掉这些行-`
gmat1=gpuArray(mat1);gmat2=gpuArray(mat2)在GPU函数内,编辑此行-
nrows=size(gmat1,1)并编辑函数语法-
函数输出=GetweightedSumGPU(gmat1、gmat2、RF、重叠)