Neural network 从头开始构建非分层LSTM网络,如何进行前向传递和后向传递?
根据我对LSTM单元工作原理的理解,我正在从头开始构建一个LSTM网络 没有层,所以我尝试实现我在教程中看到的方程的非矢量形式。我也在使用牢房状态的窥视孔 到目前为止,我知道它看起来是这样的: 据此,我为每个向前通行的闸门建立了以下方程式:Neural network 从头开始构建非分层LSTM网络,如何进行前向传递和后向传递?,neural-network,backpropagation,lstm,Neural Network,Backpropagation,Lstm,根据我对LSTM单元工作原理的理解,我正在从头开始构建一个LSTM网络 没有层,所以我尝试实现我在教程中看到的方程的非矢量形式。我也在使用牢房状态的窥视孔 到目前为止,我知道它看起来是这样的: 据此,我为每个向前通行的闸门建立了以下方程式: i_t = sigmoid( i_w * (x_t + c_t) + i_b ) f_t = sigmoid( f_w * (x_t + c_t) + f_b ) cell_gate = tanh( c_w * x_t + c_b ) c_t = (f_
i_t = sigmoid( i_w * (x_t + c_t) + i_b )
f_t = sigmoid( f_w * (x_t + c_t) + f_b )
cell_gate = tanh( c_w * x_t + c_b )
c_t = (f_t * c_t) + (i_t * cell_gate)
o_t = sigmoid( o_w * (x_t + c_t) + o_b )
h_t = o_t * tanh(c_t)
式中,_w为相应闸门的平均重量,_b为偏差。此外,我还将最左边的第一个乙状结肠命名为“cell_门”
对我来说,后传是模糊的,我不知道如何正确推导这些方程 我通常知道计算误差的公式是:误差=f'(x_t)*(接收误差)。其中f’(x_t)是激活函数的一阶导数,接收到的_误差可以是输出神经元的(目标-输出)或∑(o_e*w_io)用于隐藏的神经元 其中o_e是当前单元格输出到的其中一个单元格的误差,w_io是连接它们的权重 我不确定LSTM单元作为一个整体是否被视为一个神经元,因此我将每个门都视为神经元,并尝试计算每个门的错误信号。然后,仅使用来自单元门的错误信号,将其传回网络…:
o_e = sigmoid'(o_w * (x_t + c_t) + o_b) * (received_error)
o_w += o_l * x_t * o_e
o_b += o_l * sigmoid(o_b) * o_e
…其余的闸门采用相同的格式
那么整个LSTM单元的误差等于o_e
然后,对于当前单元格上方的LSTM单元格,其接收到的错误等于:
tanh'(x_t) * ∑(o_e * w_io)
这些都对吗?我做错了什么吗?我要承担这项任务,我相信你的方法是正确的: 托马斯·拉合尔的一些好作品
//////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////
//BACKPROP
//////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////
//scale partials
for (int c = 0; c < cell_blocks; c++) {
for (int i = 0; i < full_input_dimension; i++) {
this.dSdwWeightsInputGate[c][i] *= ForgetGateAct[c];
this.dSdwWeightsForgetGate[c][i] *= ForgetGateAct[c];
this.dSdwWeightsNetInput[c][i] *= ForgetGateAct[c];
dSdwWeightsInputGate[c][i] += full_input[i] * neuronInputGate.Derivative(InputGateSum[c]) * NetInputAct[c];
dSdwWeightsForgetGate[c][i] += full_input[i] * neuronForgetGate.Derivative(ForgetGateSum[c]) * CEC1[c];
dSdwWeightsNetInput[c][i] += full_input[i] * neuronNetInput.Derivative(NetInputSum[c]) * InputGateAct[c];
}
}
if (target_output != null) {
double[] deltaGlobalOutputPre = new double[output_dimension];
for (int k = 0; k < output_dimension; k++) {
deltaGlobalOutputPre[k] = target_output[k] - output[k];
}
//output to hidden
double[] deltaNetOutput = new double[cell_blocks];
for (int k = 0; k < output_dimension; k++) {
//links
for (int c = 0; c < cell_blocks; c++) {
deltaNetOutput[c] += deltaGlobalOutputPre[k] * weightsGlobalOutput[k][c];
weightsGlobalOutput[k][c] += deltaGlobalOutputPre[k] * NetOutputAct[c] * learningRate;
}
//bias
weightsGlobalOutput[k][cell_blocks] += deltaGlobalOutputPre[k] * 1.0 * learningRate;
}
for (int c = 0; c < cell_blocks; c++) {
//update output gates
double deltaOutputGatePost = deltaNetOutput[c] * CECSquashAct[c];
double deltaOutputGatePre = neuronOutputGate.Derivative(OutputGateSum[c]) * deltaOutputGatePost;
for (int i = 0; i < full_input_dimension; i++) {
weightsOutputGate[c][i] += full_input[i] * deltaOutputGatePre * learningRate;
}
peepOutputGate[c] += CEC3[c] * deltaOutputGatePre * learningRate;
//before outgate
double deltaCEC3 = deltaNetOutput[c] * OutputGateAct[c] * neuronCECSquash.Derivative(CEC3[c]);
//update input gates
double deltaInputGatePost = deltaCEC3 * NetInputAct[c];
double deltaInputGatePre = neuronInputGate.Derivative(InputGateSum[c]) * deltaInputGatePost;
for (int i = 0; i < full_input_dimension; i++) {
weightsInputGate[c][i] += dSdwWeightsInputGate[c][i] * deltaCEC3 * learningRate;
}
peepInputGate[c] += CEC2[c] * deltaInputGatePre * learningRate;
//before ingate
double deltaCEC2 = deltaCEC3;
//update forget gates
double deltaForgetGatePost = deltaCEC2 * CEC1[c];
double deltaForgetGatePre = neuronForgetGate.Derivative(ForgetGateSum[c]) * deltaForgetGatePost;
for (int i = 0; i < full_input_dimension; i++) {
weightsForgetGate[c][i] += dSdwWeightsForgetGate[c][i] * deltaCEC2 * learningRate;
}
peepForgetGate[c] += CEC1[c] * deltaForgetGatePre * learningRate;
//update cell inputs
for (int i = 0; i < full_input_dimension; i++) {
weightsNetInput[c][i] += dSdwWeightsNetInput[c][i] * deltaCEC3 * learningRate;
}
//no peeps for cell inputs
}
}
//////////////////////////////////////////////////////////////
//roll-over context to next time step
for (int j = 0; j < cell_blocks; j++) {
context[j] = NetOutputAct[j];
CEC[j] = CEC3[j];
}
//////////////////////////////////
//////////////////////////////////////////////////////////////
//背撑
//////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////
//比例部分
对于(int c=0;c
还有,也许更有趣的是Andrej Karpathy的讲座和讲稿: