Python 什么';在二元分类的深度全连通神经网络中进行反向传播是正确的
我尝试使用python和numpy实现一个深度完全连接的二元分类神经网络,并使用Python 什么';在二元分类的深度全连通神经网络中进行反向传播是正确的,python,numpy,neural-network,deep-learning,gradient-descent,Python,Numpy,Neural Network,Deep Learning,Gradient Descent,我尝试使用python和numpy实现一个深度完全连接的二元分类神经网络,并使用梯度下降作为优化算法 事实证明,我的模型严重不适合,即使在1000个时代之后也不适合。我的减重从未超过0.69321,我试着检查我的体重导数,并立即意识到它们非常小(小到1e-7),如此小的梯度导致我的模型永远不会有更大的梯度下降更新,也永远不会达到全局最小值。我将详细介绍正向和反向传播的数学/伪代码,请让我知道我是否在正确的轨道上。我将遵循Andrew Ng在DeepLearning.ai中使用的命名约定 假设我们
梯度下降
作为优化算法
事实证明,我的模型严重不适合,即使在1000个时代之后也不适合。我的减重从未超过0.69321,我试着检查我的体重导数,并立即意识到它们非常小(小到1e-7),如此小的梯度导致我的模型永远不会有更大的梯度下降更新,也永远不会达到全局最小值。我将详细介绍正向和反向传播的数学/伪代码,请让我知道我是否在正确的轨道上。我将遵循Andrew Ng在DeepLearning.ai中使用的命名约定
假设我们有4层神经网络,在输出层只有一个节点
,可以在0/1之间分类
X->Z1->A1->Z2->A2->Z3->A3->Z4->A4
前向传播
Z1 = W1 dot_product X + B1
A1 = tanh_activation(Z1)
Z2 = W2 dot_product A1 + B2
A2 = tanh_activation(Z2)
Z3 = W3 dot_product A2 + B3
A3 = tanh_activation(Z3)
Z4 = W4 dot_product A3 + B4
A4 = sigmoid_activation(Z4)
DA4 = -( Y / A4 + (1 - Y / 1 - A4 ) ) ( derivative of output activations or logits w.r.t to loss function )
DZ4 = DA4 * derivative_tanh(Z4) ( derivative of tanh activation, which I assume is ( 1 - (Z4 ) ^ 2 ) )
Dw4 = ( dZ4 dot_produt A3.T ) / total_number_of_samples
Db4 = np.sum(DZ4, axis = 1, keepdims = True ... ) / total_number_of_samples
DA3 = W4.T dot_product(DZ4)
DZ3 = DA3 * derivative_tanh( Z3 )
DW3 = ( DZ3 dot_product A2.T ) / total_number_of_samples
DB3 = np.sum( DZ3, .. ) / total_number_of_samples
DA2 = W3.T dot_product(DZ3)
DZ2 = DA2 * derivative_tanh( Z2 )
DW2 = ( DZ2 dot_product A1.T ) / total_number_of_samples
DB2 = np.sum( DZ2, .. ) / total_number_of_samples
DA1 = W2.T dot_product(DZ2)
DZ1 = DA1 * derivative_tanh( Z1 )
DW1 = ( DZ1 dot_product X.T ) / total_number_of_samples
DB1 = np.sum( DZ1, .. ) / total_number_of_samples
反向传播
Z1 = W1 dot_product X + B1
A1 = tanh_activation(Z1)
Z2 = W2 dot_product A1 + B2
A2 = tanh_activation(Z2)
Z3 = W3 dot_product A2 + B3
A3 = tanh_activation(Z3)
Z4 = W4 dot_product A3 + B4
A4 = sigmoid_activation(Z4)
DA4 = -( Y / A4 + (1 - Y / 1 - A4 ) ) ( derivative of output activations or logits w.r.t to loss function )
DZ4 = DA4 * derivative_tanh(Z4) ( derivative of tanh activation, which I assume is ( 1 - (Z4 ) ^ 2 ) )
Dw4 = ( dZ4 dot_produt A3.T ) / total_number_of_samples
Db4 = np.sum(DZ4, axis = 1, keepdims = True ... ) / total_number_of_samples
DA3 = W4.T dot_product(DZ4)
DZ3 = DA3 * derivative_tanh( Z3 )
DW3 = ( DZ3 dot_product A2.T ) / total_number_of_samples
DB3 = np.sum( DZ3, .. ) / total_number_of_samples
DA2 = W3.T dot_product(DZ3)
DZ2 = DA2 * derivative_tanh( Z2 )
DW2 = ( DZ2 dot_product A1.T ) / total_number_of_samples
DB2 = np.sum( DZ2, .. ) / total_number_of_samples
DA1 = W2.T dot_product(DZ2)
DZ1 = DA1 * derivative_tanh( Z1 )
DW1 = ( DZ1 dot_product X.T ) / total_number_of_samples
DB1 = np.sum( DZ1, .. ) / total_number_of_samples
这是我的tanh实现
def tanh_activation(x):
return np.tanh(x)
我的tanh衍生工具实现
def derivative_tanh(x):
return ( 1 - np.power(x,2))
在上述反向传播步骤之后,我使用梯度下降及其各自的导数更新权重和偏差。但是,无论我运行该算法多少次,该模型都无法改善其损失超过0.69
,并且输出权重的导数(在我的例子中dW4
)非常低1e-7
。我假设我的导数_tanh
函数或我的dZ
计算真的关闭了,这会导致坏的损耗值传播回网络。请分享您对我的backprop实现是否有效的想法。蒂亚。我通过了
及
和许多其他博客,但找不到我要找的内容。我找到了解决问题的方法,并在此处回答:。我建议关闭该线程。我找到了解决问题的方法,并在此处回答:。我建议把线合上