Neural network 这个OpenCL内核会导致错误CL\U INVALID\U COMMAND\U QUEUE吗_Neural Network_Opencl

Neural network 这个OpenCL内核会导致错误CL\U INVALID\U COMMAND\U QUEUE吗

neural-network opencl

Neural network 这个OpenCL内核会导致错误CL\U INVALID\U COMMAND\U QUEUE吗,neural-network,opencl,Neural Network,Opencl,我在实现前馈多层感知器时遇到了一个问题，使用JOCL在Java中的OpenCL中使用了back-prop学习。以下是计算阶段的核心代码： #pragma OPENCL EXTENSION cl_khr_fp64 : enable __kernel void Neuron(__global const double *inputPatterns, __global double *weights,

我在实现前馈多层感知器时遇到了一个问题，使用JOCL在Java中的OpenCL中使用了back-prop学习。以下是计算阶段的核心代码：

    #pragma OPENCL EXTENSION cl_khr_fp64 : enable
    __kernel void Neuron(__global const double *inputPatterns,
                           __global double *weights,
                           __global const int *numInputs,
                           __global const int *activation,
                           __global const double *bias,
                           __global const int *usingBias,
                           __global double *values,
                           __global const int *maxNumFloats,
                           __global const int *patternIndex,
                           __global const int *inputPatternSize,
                           __global const int *indexOffset,
                           __global const int *isInputNeuron,
                           __global const int *inputs)
    {
       int gid = get_global_id(0);
       double sum = 0.0;
       for(int i = 0; i < numInputs[gid+indexOffset[0]]; i++)
       {
           sum += values[inputs[(gid+indexOffset[0]) * maxNumFloats[0] + i]] *
                   weights[(gid+indexOffset[0]) * maxNumFloats[0] + i];
       }
       if(usingBias[gid+indexOffset[0]])
           sum += bias[gid+indexOffset[0]];
       if(isInputNeuron[gid+indexOffset[0]])
           sum += inputPatterns[gid+indexOffset[0]+(patternIndex[0] * inputPatternSize[0])];
       if(activation[gid+indexOffset[0]] == 1)
           sum = 1.0 / (1.0 + exp(-sum));
       values[gid + indexOffset[0]] = sum;
    }

#pragma OPENCL扩展cl_khr_fp64:启用
__内核空神经元（uu全局常数双*输入模式，
__全球双*权重，
__全局常量int*numinput，
__全局常量int*激活，
__全局常数双*偏差，
__全局常量int*使用偏差，
__全局双*值，
__全局常量int*maxNumFloats，
__全局常量int*patternIndex，
__全局常量int*inputPatternSize，
__全局常量int*indexOffset，
__全局常量int*isInputNeuron，
__全局常量int*输入）
{
int gid=获取全局id（0）；
双和=0.0；
for（int i=0；i


基本上，我为网络中的每一层运行这个内核。对于第一层，没有“输入”，因此循环不会执行。然而，由于第一层是一个输入节点层，因此它确实添加了输入模式中的相关值。这执行得很好，此时我可以读回值
但是，当我尝试运行第二层时（第一层的每个节点都有输入），对clFinish（）的调用会返回错误CL\u INVALID\u COMMAND\u QUEUE。有时，此错误还伴随着驾驶员碰撞和恢复。例如，我已经读过很多文章，说这可能是TDR超时的问题，并试图提高限制，但不确定这是否会产生任何影响
我正在检查对clSetKernelArg（）的调用，以检查是否有任何愚蠢的行为，但是有人能在代码中发现任何明显的错误吗？由于包含for循环，似乎在第二层引入了错误。。。如果需要的话，我可以澄清其中的任何参数，但对于一篇最初的帖子来说，这似乎有点过头了
此外，我完全知道这段代码可能会冒犯各地有能力的编码人员，但请随意使用：p
编辑：主机代码：
    //Calc
    for(int k = 0; k < GPUTickList.length; k++)
    {
        clFlush(clCommandQueue);
        clFinish(clCommandQueue);
        //If input nodes
        if(k == 0)
            //Set index offset to 0
            GPUMapIndexOffset.asIntBuffer().put(0, 0);
        else
            //Update index offset
            GPUMapIndexOffset.asIntBuffer().put(0,
                GPUMapIndexOffset.asIntBuffer().get(0) + GPUTickList[k-1]);
        //Write index offset to GPU buffer
        ret = clEnqueueWriteBuffer(clCommandQueue, memObjects[12], CL_TRUE, 0,
                Sizeof.cl_int, Pointer.to(GPUMapIndexOffset.position(0)), 0, null, null);             
        //Set work size (width of layer)
        global_work_size[0] = GPUTickList[k];
        ret = clEnqueueNDRangeKernel(clCommandQueue, kernel_iterate, 1,
            global_work_offset, global_work_size, local_work_size,
            0, null, null);
    }

//计算
for（int k=0；k

编辑2：我已将完整代码上载到。
我不确定循环上面有什么。。是否使用此循环以外的队列？下面是一些你可能想尝试的东西
//flush + finish if you need to before the loop, otherwise remove these lines
clFlush(clCommandQueue);
clFinish(clCommandQueue);

cl_event latestEvent;
//Calc
for(int k = 0; k < GPUTickList.length; k++)
{
    //If input nodes
    if(k == 0)
        //Set index offset to 0
        GPUMapIndexOffset.asIntBuffer().put(0, 0);
    else
        //Update index offset
        GPUMapIndexOffset.asIntBuffer().put(0,
            GPUMapIndexOffset.asIntBuffer().get(0) + GPUTickList[k-1]);
    //Write index offset to GPU buffer
    ret = clEnqueueWriteBuffer(clCommandQueue, memObjects[12], CL_TRUE, 0,
            Sizeof.cl_int, Pointer.to(GPUMapIndexOffset.position(0)), 0, null, null);             
    //Set work size (width of layer)
    global_work_size[0] = GPUTickList[k];
    ret = clEnqueueNDRangeKernel(clCommandQueue, kernel_iterate, 1,
        global_work_offset, global_work_size, local_work_size,
        0, null, &latestEvent);
    clWaitForEvents(1, &latestEvent);
}

//如果需要，请在循环之前刷新并完成，否则请删除这些行
clFlush（clCommandQueue）；
clFinish（clCommandQueue）；
cl_事件延迟；
//计算
for（int k=0；k
已解决。修复了此错误，将所有使用[0]编制索引的内容设置为直接内核参数，而不是缓冲区。很明显，硬件不喜欢很多东西同时访问缓冲区的一个特定元素。
您有一部分主机代码要共享吗？如果内核运行于第一层神经元，我怀疑内核本身是否存在问题。另外，您是否尝试过使用clWaitForEvents（）而不是clFinish（）？当然，添加了这个内核在循环中排队的位置，如果需要，可以发布更多，只需说什么。我确实尝试过事件，但它给我带来了麻烦，所以我坚持阻止调用和clFinish（），我现在再尝试一次，看看效果如何。另外，我有整个文件，GPUTrain函数是重要的一个（它有点单一）。谢谢你的意见！不客气。我不知道您正在使用java+ocl。我希望指针的东西仍然有效。我再看一眼这张照片