Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/cplusplus/129.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C++ 为什么CUDA内核会给出与原始代码不同的结果?_C++_Cuda - Fatal编程技术网

C++ 为什么CUDA内核会给出与原始代码不同的结果?

C++ 为什么CUDA内核会给出与原始代码不同的结果?,c++,cuda,C++,Cuda,我移植了这段代码: if(_layersCount > 1) { for(int i=_layersCount-2;i>=0;i--) { for(int j=0;j<_neuronsPerLayerCount[i];j++) // cuda kernel { localGradients[indexByLayerAndNeuron(i, j)] =

我移植了这段代码:

    if(_layersCount > 1)
    {
        for(int i=_layersCount-2;i>=0;i--)
        {
            for(int j=0;j<_neuronsPerLayerCount[i];j++) // cuda kernel
            {
                localGradients[indexByLayerAndNeuron(i, j)] = 0;

                for(int k=0;k<_neuronsPerLayerCount[i+1];k++)
                {
                    localGradients[indexByLayerAndNeuron(i, j)] += _neuronsInputsWeights[indexByLayerNeuronAndInput(i+1, k, j)]
                                                                    * localGradients[indexByLayerAndNeuron(i+1, k)];
                }

                localGradients[indexByLayerAndNeuron(i, j)] *= derivatives[indexByLayerAndNeuron(i, j)];
            }
        }
    }
if(_layersCount>1)
{
对于(int i=\u layersCount-2;i>=0;i--)
{
对于(int j=0;j=0;i--)
{
//计算其他图层的坡度
BlockScont=地板((双)_神经元水平计数[i]/threads.x)+1;
块=dim3(块,1);
计算其他层的计算梯度(设备局部梯度,_NeuronsInputsWights,设备导数,_neuronsPerLayerCount[i],_NeuronsPreviousLayers[i],_NeuronsPreviousLayers[i+1],_neuronsPerLayerCount[i+1],_InputsPreviousLayerCount[i],_InputsCurrentLayer[i]);
}
}
其他层内核的CalculateLocalGradients:

__global__ void calculateLocalGradientsForAnotherLayers(double * localGradients, double * neuronsInputsWeights, double * derivatives, int neuronsCount, int neuronsInPreviousLayers, int neuronsInPreviousLayersWithCurrent, int neuronsInNextLayer, int inputsInPreviousLayers, int inputsInCurrentLayer)
{
    int idx = blockIdx.x * blockDim.x + threadIdx.x;

    if(idx < neuronsCount)
    {
        int neuron = neuronsInPreviousLayers + idx;

        localGradients[neuron] = 0;

        // this to Kernel, then reduce localGradients.
        for(int k=0;k<neuronsInNextLayer;k++)
        {
            localGradients[neuron] += neuronsInputsWeights[inputsInPreviousLayers + k*inputsInCurrentLayer + idx]
                                                            * localGradients[neuronsInPreviousLayersWithCurrent + k];
        }

        localGradients[neuron] *= derivatives[neuron];
    }
}
\uuuuu全局\uuuuuuu无效计算其他层的梯度(双*局部梯度、双*神经元输入光、双*导数、int-NeuronScont、int-NeuronsPreviousLayers、int-NeuronsPreviousLayers with Current、int-NeuronsUnextLayers、int-InputsUnviousLayers、int-InputsUncurrentLayers)
{
int idx=blockIdx.x*blockDim.x+threadIdx.x;
if(idx对于内核主体中的(int k=0;k,需要通过
localGradients
array进行某种块间同步:

for(int k=0;k<neuronsInNextLayer;k++)
        {
            localGradients[neuron] += neuronsInputsWeights[inputsInPreviousLayers + k*inputsInCurrentLayer + idx]
                                                            * localGradients[neuronsInPreviousLayersWithCurrent + k];
        }

for(int k=0;k我发现了问题。改为行:

calculateLocalGradientsForAnotherLayers <<<blocks, threads>>> (deviceLocalGradients, _neuronsInputsWeights, deviceDerivatives, _neuronsPerLayerCount[i], _neuronsInPreviousLayers[i], _neuronsInPreviousLayers[i+1], _neuronsPerLayerCount[i+1], _inputsInPreviousLayers[i], _inputsInCurrentLayer[i]);
计算其他层的梯度(设备局部梯度、\u NeuronsInputs权重、设备导数、\u neuronsPerLayerCount[i]、\u NeuronsPreviousLayers[i]、\u NeuronsPreviousLayers[i+1]、\u neuronsPerLayerCount[i+1]、\u InputsPreviousLayerCount[i]、\u InputsCurrentLayer[i]);
应写:

calculateLocalGradientsForAnotherLayers <<<blocks, threads>>> (deviceLocalGradients, _neuronsInputsWeights, deviceDerivatives, _neuronsPerLayerCount[i], _neuronsInPreviousLayers[i], _neuronsInPreviousLayers[i+1], _neuronsPerLayerCount[i+1], _inputsInPreviousLayers[i+1], _inputsInCurrentLayer[i+1]);
计算其他层的梯度(设备局部梯度、\u神经元计数权、设备导数、\u神经元上一层计数[i]、\u神经元上一层[i]、\u神经元上一层计数[i+1]、\u神经元上一层计数[i+1]、\u神经元上一层输入[i+1]、\u神经元上一层输入[i+1]);

如何调用内核?网格/块大小。请参阅第二个代码块。threads.x是512如何添加同步?如果我将其保存到temp变量中,然后保存到localGradients[neuron]中会怎么样?让我们从问题正文(代码块1)中显示的序列代码开始。让我知道
循环(I、j和k)的
哪些有独立的迭代?显然
k
循环的迭代是依赖的。那么
i
j
循环呢?非常感谢
calculateLocalGradientsForAnotherLayers <<<blocks, threads>>> (deviceLocalGradients, _neuronsInputsWeights, deviceDerivatives, _neuronsPerLayerCount[i], _neuronsInPreviousLayers[i], _neuronsInPreviousLayers[i+1], _neuronsPerLayerCount[i+1], _inputsInPreviousLayers[i+1], _inputsInCurrentLayer[i+1]);