OpenCL内核无法编译
我一直在读这个矩阵乘法内核代码,我不明白为什么调用OpenCL内核无法编译,c,multithreading,opencl,parallel-processing,gpu,C,Multithreading,Opencl,Parallel Processing,Gpu,我一直在读这个矩阵乘法内核代码,我不明白为什么调用clBuildProgram会返回CL\u BUILD\u PROGRAM\u FAILURE。以下是我的内核代码: __kernel void MatMulKernel(__global const float* A, __global const float* B, float* C,
clBuildProgram
会返回CL\u BUILD\u PROGRAM\u FAILURE
。以下是我的内核代码:
__kernel void MatMulKernel(__global const float* A,
__global const float* B,
float* C,
const int size1,
const int size2,
const int size3)
{
int k = get_global_id(0);
int i;
int line = k / size3;
int column = k % size3;
float partial = 0;
for(i = 0; i < size2; i++)
{
partial += A[line * size2 + i] * B[i * size3 + column];
}
C[k] = partial;
}
调用
clSetKernelArg
的所有错误值都是CL\u SUCCESS
。当程序到达clEnqueueNDRangeKernel
时,它会崩溃。这就是我得到的错误:
错误:内核指针参数必须为空
指向addrSpace全局、局部或常量
float*C
参数可能应该是\uu global
。所有内核指针参数都需要一个地址空间限定符。D'oh我怎么会错过这个?非常感谢。别担心,我修好了。我的localWorkSize参数为0。。。这么多愚蠢的错误(local_work_size可以是空值,在这种情况下,OpenCL实现将决定如何将全局工作项分解为适当的工作组实例。因此传递0不会导致崩溃。@vocaro你说得对。我在代码上做了更多工作,实际问题是globalWorkSize不是localWorkSize的倍数。
err = clSetKernelArg(hKernel, 0, sizeof(cl_mem), (void *)&hDeviceMemA);
err = clSetKernelArg(hKernel, 1, sizeof(cl_mem), (void *)&hDeviceMemB);
err = clSetKernelArg(hKernel, 2, sizeof(cl_mem), (void *)&hDeviceMemC);
err = clSetKernelArg(hKernel, 3, sizeof(cl_int), (void *)&s1);
err = clSetKernelArg(hKernel, 4, sizeof(cl_int), (void *)&s2);
err = clSetKernelArg(hKernel, 5, sizeof(cl_int), (void *)&s3);
cl_event events[1];
// execute kernel
start = clock();
err = clEnqueueNDRangeKernel(hCmdQueue, hKernel, 1, 0, (const size_t *)BENCH_SIZE_COMP, 0, 0, 0, &events[0]);
clWaitForEvents(1, events);