Neural network 从头开始构建非分层LSTM网络,如何进行前向传递和后向传递?

Neural network 从头开始构建非分层LSTM网络,如何进行前向传递和后向传递?,neural-network,backpropagation,lstm,Neural Network,Backpropagation,Lstm,根据我对LSTM单元工作原理的理解,我正在从头开始构建一个LSTM网络 没有层,所以我尝试实现我在教程中看到的方程的非矢量形式。我也在使用牢房状态的窥视孔 到目前为止,我知道它看起来是这样的: 据此,我为每个向前通行的闸门建立了以下方程式: i_t = sigmoid( i_w * (x_t + c_t) + i_b ) f_t = sigmoid( f_w * (x_t + c_t) + f_b ) cell_gate = tanh( c_w * x_t + c_b ) c_t = (f_

根据我对LSTM单元工作原理的理解,我正在从头开始构建一个LSTM网络

没有层,所以我尝试实现我在教程中看到的方程的非矢量形式。我也在使用牢房状态的窥视孔

到目前为止,我知道它看起来是这样的:

据此,我为每个向前通行的闸门建立了以下方程式:

i_t = sigmoid( i_w * (x_t + c_t) + i_b )
f_t = sigmoid( f_w * (x_t + c_t) + f_b )

cell_gate = tanh( c_w * x_t + c_b )

c_t = (f_t * c_t) + (i_t * cell_gate)

o_t = sigmoid( o_w * (x_t + c_t) + o_b )

h_t = o_t * tanh(c_t)
式中,_w为相应闸门的平均重量,_b为偏差。此外,我还将最左边的第一个乙状结肠命名为“cell_门”


对我来说,后传是模糊的,我不知道如何正确推导这些方程

我通常知道计算误差的公式是:误差=f'(x_t)*(接收误差)。其中f’(x_t)是激活函数的一阶导数,接收到的_误差可以是输出神经元的(目标-输出)或∑(o_e*w_io)用于隐藏的神经元

其中o_e是当前单元格输出到的其中一个单元格的误差,w_io是连接它们的权重

我不确定LSTM单元作为一个整体是否被视为一个神经元,因此我将每个门都视为神经元,并尝试计算每个门的错误信号。然后,仅使用来自单元门的错误信号,将其传回网络…:

o_e = sigmoid'(o_w * (x_t + c_t) + o_b) * (received_error)
o_w += o_l * x_t * o_e
o_b += o_l * sigmoid(o_b) * o_e
…其余的闸门采用相同的格式

那么整个LSTM单元的误差等于o_e

然后,对于当前单元格上方的LSTM单元格,其接收到的错误等于:

tanh'(x_t) * ∑(o_e * w_io)

这些都对吗?我做错了什么吗?

我要承担这项任务,我相信你的方法是正确的:

托马斯·拉合尔的一些好作品

    ////////////////////////////////////////////////////////////// 
    ////////////////////////////////////////////////////////////// 
    //BACKPROP 
    ////////////////////////////////////////////////////////////// 
    ////////////////////////////////////////////////////////////// 

    //scale partials 
    for (int c = 0; c < cell_blocks; c++) { 
        for (int i = 0; i < full_input_dimension; i++) { 
            this.dSdwWeightsInputGate[c][i] *= ForgetGateAct[c]; 
            this.dSdwWeightsForgetGate[c][i] *= ForgetGateAct[c]; 
            this.dSdwWeightsNetInput[c][i] *= ForgetGateAct[c]; 

            dSdwWeightsInputGate[c][i] += full_input[i] * neuronInputGate.Derivative(InputGateSum[c]) * NetInputAct[c]; 
            dSdwWeightsForgetGate[c][i] += full_input[i] * neuronForgetGate.Derivative(ForgetGateSum[c]) * CEC1[c]; 
            dSdwWeightsNetInput[c][i] += full_input[i] * neuronNetInput.Derivative(NetInputSum[c]) * InputGateAct[c]; 
        } 
    } 

    if (target_output != null) { 
        double[] deltaGlobalOutputPre = new double[output_dimension]; 
        for (int k = 0; k < output_dimension; k++) { 
            deltaGlobalOutputPre[k] = target_output[k] - output[k]; 
        } 

        //output to hidden 
        double[] deltaNetOutput = new double[cell_blocks]; 
        for (int k = 0; k < output_dimension; k++) { 
            //links 
            for (int c = 0; c < cell_blocks; c++) { 
                deltaNetOutput[c] += deltaGlobalOutputPre[k] * weightsGlobalOutput[k][c]; 
                weightsGlobalOutput[k][c] += deltaGlobalOutputPre[k] * NetOutputAct[c] * learningRate; 
            } 
            //bias 
            weightsGlobalOutput[k][cell_blocks] += deltaGlobalOutputPre[k] * 1.0 * learningRate; 
        } 

        for (int c = 0; c < cell_blocks; c++) { 

            //update output gates 
            double deltaOutputGatePost = deltaNetOutput[c] * CECSquashAct[c]; 
            double deltaOutputGatePre = neuronOutputGate.Derivative(OutputGateSum[c]) * deltaOutputGatePost; 
            for (int i = 0; i < full_input_dimension; i++) { 
                weightsOutputGate[c][i] += full_input[i] * deltaOutputGatePre * learningRate; 
            } 
            peepOutputGate[c] += CEC3[c] * deltaOutputGatePre * learningRate; 

            //before outgate 
            double deltaCEC3 = deltaNetOutput[c] * OutputGateAct[c] * neuronCECSquash.Derivative(CEC3[c]); 

            //update input gates 
            double deltaInputGatePost = deltaCEC3 * NetInputAct[c]; 
            double deltaInputGatePre = neuronInputGate.Derivative(InputGateSum[c]) * deltaInputGatePost; 
            for (int i = 0; i < full_input_dimension; i++) { 
                weightsInputGate[c][i] += dSdwWeightsInputGate[c][i] * deltaCEC3 * learningRate; 
            } 
            peepInputGate[c] += CEC2[c] * deltaInputGatePre * learningRate; 

            //before ingate 
            double deltaCEC2 = deltaCEC3; 

            //update forget gates 
            double deltaForgetGatePost = deltaCEC2 * CEC1[c]; 
            double deltaForgetGatePre = neuronForgetGate.Derivative(ForgetGateSum[c]) * deltaForgetGatePost; 
            for (int i = 0; i < full_input_dimension; i++) { 
                weightsForgetGate[c][i] += dSdwWeightsForgetGate[c][i] * deltaCEC2 * learningRate; 
            } 
            peepForgetGate[c] += CEC1[c] * deltaForgetGatePre * learningRate; 

            //update cell inputs 
            for (int i = 0; i < full_input_dimension; i++) { 
                weightsNetInput[c][i] += dSdwWeightsNetInput[c][i] * deltaCEC3 * learningRate; 
            } 
            //no peeps for cell inputs 
        } 
    } 

    ////////////////////////////////////////////////////////////// 

    //roll-over context to next time step 
    for (int j = 0; j < cell_blocks; j++) { 
        context[j] = NetOutputAct[j]; 
        CEC[j] = CEC3[j]; 
    } 
//////////////////////////////////
////////////////////////////////////////////////////////////// 
//背撑
////////////////////////////////////////////////////////////// 
////////////////////////////////////////////////////////////// 
//比例部分
对于(int c=0;c
还有,也许更有趣的是Andrej Karpathy的讲座和讲稿: