Neural network 从头开始构建非分层LSTM网络，如何进行前向传递和后向传递？_Neural Network_Backpropagation_Lstm

Neural network 从头开始构建非分层LSTM网络，如何进行前向传递和后向传递？

neural-network

Neural network 从头开始构建非分层LSTM网络，如何进行前向传递和后向传递？,neural-network,backpropagation,lstm,Neural Network,Backpropagation,Lstm,根据我对LSTM单元工作原理的理解，我正在从头开始构建一个LSTM网络没有层，所以我尝试实现我在教程中看到的方程的非矢量形式。我也在使用牢房状态的窥视孔到目前为止，我知道它看起来是这样的：据此，我为每个向前通行的闸门建立了以下方程式： i_t = sigmoid( i_w * (x_t + c_t) + i_b ) f_t = sigmoid( f_w * (x_t + c_t) + f_b ) cell_gate = tanh( c_w * x_t + c_b ) c_t = (f_

根据我对LSTM单元工作原理的理解，我正在从头开始构建一个LSTM网络

没有层，所以我尝试实现我在教程中看到的方程的非矢量形式。我也在使用牢房状态的窥视孔

到目前为止，我知道它看起来是这样的：

据此，我为每个向前通行的闸门建立了以下方程式：

i_t = sigmoid( i_w * (x_t + c_t) + i_b )
f_t = sigmoid( f_w * (x_t + c_t) + f_b )

cell_gate = tanh( c_w * x_t + c_b )

c_t = (f_t * c_t) + (i_t * cell_gate)

o_t = sigmoid( o_w * (x_t + c_t) + o_b )

h_t = o_t * tanh(c_t)

式中，_w为相应闸门的平均重量，_b为偏差。此外，我还将最左边的第一个乙状结肠命名为“cell_门”

对我来说，后传是模糊的，我不知道如何正确推导这些方程

我通常知道计算误差的公式是：误差=f'（x_t）*（接收误差）。其中f’（x_t）是激活函数的一阶导数，接收到的_误差可以是输出神经元的（目标-输出）或∑（o_e*w_io）用于隐藏的神经元

其中o_e是当前单元格输出到的其中一个单元格的误差，w_io是连接它们的权重

我不确定LSTM单元作为一个整体是否被视为一个神经元，因此我将每个门都视为神经元，并尝试计算每个门的错误信号。然后，仅使用来自单元门的错误信号，将其传回网络…：

o_e = sigmoid'(o_w * (x_t + c_t) + o_b) * (received_error)
o_w += o_l * x_t * o_e
o_b += o_l * sigmoid(o_b) * o_e

…其余的闸门采用相同的格式

那么整个LSTM单元的误差等于o_e

然后，对于当前单元格上方的LSTM单元格，其接收到的错误等于：

tanh'(x_t) * ∑(o_e * w_io)

这些都对吗？我做错了什么吗？

我要承担这项任务，我相信你的方法是正确的：

托马斯·拉合尔的一些好作品

    ////////////////////////////////////////////////////////////// 
    ////////////////////////////////////////////////////////////// 
    //BACKPROP 
    ////////////////////////////////////////////////////////////// 
    ////////////////////////////////////////////////////////////// 

    //scale partials 
    for (int c = 0; c < cell_blocks; c++) { 
        for (int i = 0; i < full_input_dimension; i++) { 
            this.dSdwWeightsInputGate[c][i] *= ForgetGateAct[c]; 
            this.dSdwWeightsForgetGate[c][i] *= ForgetGateAct[c]; 
            this.dSdwWeightsNetInput[c][i] *= ForgetGateAct[c]; 

            dSdwWeightsInputGate[c][i] += full_input[i] * neuronInputGate.Derivative(InputGateSum[c]) * NetInputAct[c]; 
            dSdwWeightsForgetGate[c][i] += full_input[i] * neuronForgetGate.Derivative(ForgetGateSum[c]) * CEC1[c]; 
            dSdwWeightsNetInput[c][i] += full_input[i] * neuronNetInput.Derivative(NetInputSum[c]) * InputGateAct[c]; 
        } 
    } 

    if (target_output != null) { 
        double[] deltaGlobalOutputPre = new double[output_dimension]; 
        for (int k = 0; k < output_dimension; k++) { 
            deltaGlobalOutputPre[k] = target_output[k] - output[k]; 
        } 

        //output to hidden 
        double[] deltaNetOutput = new double[cell_blocks]; 
        for (int k = 0; k < output_dimension; k++) { 
            //links 
            for (int c = 0; c < cell_blocks; c++) { 
                deltaNetOutput[c] += deltaGlobalOutputPre[k] * weightsGlobalOutput[k][c]; 
                weightsGlobalOutput[k][c] += deltaGlobalOutputPre[k] * NetOutputAct[c] * learningRate; 
            } 
            //bias 
            weightsGlobalOutput[k][cell_blocks] += deltaGlobalOutputPre[k] * 1.0 * learningRate; 
        } 

        for (int c = 0; c < cell_blocks; c++) { 

            //update output gates 
            double deltaOutputGatePost = deltaNetOutput[c] * CECSquashAct[c]; 
            double deltaOutputGatePre = neuronOutputGate.Derivative(OutputGateSum[c]) * deltaOutputGatePost; 
            for (int i = 0; i < full_input_dimension; i++) { 
                weightsOutputGate[c][i] += full_input[i] * deltaOutputGatePre * learningRate; 
            } 
            peepOutputGate[c] += CEC3[c] * deltaOutputGatePre * learningRate; 

            //before outgate 
            double deltaCEC3 = deltaNetOutput[c] * OutputGateAct[c] * neuronCECSquash.Derivative(CEC3[c]); 

            //update input gates 
            double deltaInputGatePost = deltaCEC3 * NetInputAct[c]; 
            double deltaInputGatePre = neuronInputGate.Derivative(InputGateSum[c]) * deltaInputGatePost; 
            for (int i = 0; i < full_input_dimension; i++) { 
                weightsInputGate[c][i] += dSdwWeightsInputGate[c][i] * deltaCEC3 * learningRate; 
            } 
            peepInputGate[c] += CEC2[c] * deltaInputGatePre * learningRate; 

            //before ingate 
            double deltaCEC2 = deltaCEC3; 

            //update forget gates 
            double deltaForgetGatePost = deltaCEC2 * CEC1[c]; 
            double deltaForgetGatePre = neuronForgetGate.Derivative(ForgetGateSum[c]) * deltaForgetGatePost; 
            for (int i = 0; i < full_input_dimension; i++) { 
                weightsForgetGate[c][i] += dSdwWeightsForgetGate[c][i] * deltaCEC2 * learningRate; 
            } 
            peepForgetGate[c] += CEC1[c] * deltaForgetGatePre * learningRate; 

            //update cell inputs 
            for (int i = 0; i < full_input_dimension; i++) { 
                weightsNetInput[c][i] += dSdwWeightsNetInput[c][i] * deltaCEC3 * learningRate; 
            } 
            //no peeps for cell inputs 
        } 
    } 

    ////////////////////////////////////////////////////////////// 

    //roll-over context to next time step 
    for (int j = 0; j < cell_blocks; j++) { 
        context[j] = NetOutputAct[j]; 
        CEC[j] = CEC3[j]; 
    }

//////////////////////////////////
////////////////////////////////////////////////////////////// 
//背撑
////////////////////////////////////////////////////////////// 
////////////////////////////////////////////////////////////// 
//比例部分
对于（int c=0；c


还有，也许更有趣的是Andrej Karpathy的讲座和讲稿：