CUDA中的二维阵列_Cuda - Fatal编程技术网

CUDA中的二维阵列

cuda

CUDA中的二维阵列,cuda,Cuda,我读了很多关于在CUDA中处理2D数组的文章，我认为在将其发送到GPU之前有必要将其展平。但是，我可以在GPU上分配1D数组并将其作为GPU中的2D数组访问吗？我尝试了，但失败了。我的代码如下所示： __global__ void kernel( int **d_a ) { cuPrintf("%p",local_array[0][0]); } int main(){ int **A; int i; cudaPrintfInit(); cuda

我读了很多关于在CUDA中处理2D数组的文章，我认为在将其发送到GPU之前有必要将其展平。但是，我可以在GPU上分配1D数组并将其作为GPU中的2D数组访问吗？我尝试了，但失败了。我的代码如下所示：

__global__ void kernel( int **d_a )
{ 

   cuPrintf("%p",local_array[0][0]);
}

int main(){

    int **A;

    int i;

    cudaPrintfInit();

    cudaMalloc((void**)&A,16*sizeof(int));

    kernel<<<1,1>>>(A);

    cudaPrintfDisplay(stdout,true);

    cudaPrintfEnd();
}

\uuuu全局\uuuuu无效内核（int**d\u a）
{ 
cuPrintf（“%p”，局部_数组[0][0]）；
}
int main（）{
国际**A；
int i；
cudaPrintfInit（）；
Cudamaloc（（void**）和A，16*sizeof（int））；
内核（A）；
cudaPrintfDisplay（标准输出，真）；
cudaPrintfEnd（）；
}

事实上，在GPU上使用2D阵列之前，无需“展平”2D阵列（尽管这可以加快内存访问速度）。如果您想要2D阵列，可以使用类似于《CUDA C编程指南》中所述的

cudamallocitch

。我相信你的代码不起作用的原因是因为你只

malloc

ed了一个1D数组-a[0][0]不存在。如果你看你的代码，你做了一个1D数组

int

s，而不是

int*

s。如果要对展开的二维阵列进行malloc，可以执行以下操作：

int** A;
cudaMalloc(&A, 16*length*sizeof(int*)); //where length is the number of rows/cols you want

然后在内核中使用（打印指向任何元素的指针）：

这就是我解决这个问题的方法我以通常的方式使用cudaMalloc，但在发送指向内核的指针时，我将其键入int（*）[col]，这对我来说很有效

__global__ void kernel( int **d_a, int row, int col, int stride )
{ 
  printf("%p", d_a[ col + row*stride ]);
}