Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/277.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 什么';在二元分类的深度全连通神经网络中进行反向传播是正确的_Python_Numpy_Neural Network_Deep Learning_Gradient Descent - Fatal编程技术网

Python 什么';在二元分类的深度全连通神经网络中进行反向传播是正确的

Python 什么';在二元分类的深度全连通神经网络中进行反向传播是正确的,python,numpy,neural-network,deep-learning,gradient-descent,Python,Numpy,Neural Network,Deep Learning,Gradient Descent,我尝试使用python和numpy实现一个深度完全连接的二元分类神经网络,并使用梯度下降作为优化算法 事实证明,我的模型严重不适合,即使在1000个时代之后也不适合。我的减重从未超过0.69321,我试着检查我的体重导数,并立即意识到它们非常小(小到1e-7),如此小的梯度导致我的模型永远不会有更大的梯度下降更新,也永远不会达到全局最小值。我将详细介绍正向和反向传播的数学/伪代码,请让我知道我是否在正确的轨道上。我将遵循Andrew Ng在DeepLearning.ai中使用的命名约定 假设我们

我尝试使用python和numpy实现一个深度完全连接的二元分类神经网络,并使用
梯度下降
作为优化算法

事实证明,我的模型严重不适合,即使在1000个时代之后也不适合。我的减重从未超过0.69321,我试着检查我的体重导数,并立即意识到它们非常小(小到1e-7),如此小的梯度导致我的模型永远不会有更大的梯度下降更新,也永远不会达到全局最小值。我将详细介绍正向和反向传播的数学/伪代码,请让我知道我是否在正确的轨道上。我将遵循Andrew Ng在
DeepLearning.ai中使用的命名约定

假设我们有4层神经网络,在输出层只有
一个节点
,可以在0/1之间分类

X->Z1->A1->Z2->A2->Z3->A3->Z4->A4

前向传播

Z1 = W1 dot_product X + B1
A1 = tanh_activation(Z1)

Z2 = W2 dot_product A1 + B2
A2 = tanh_activation(Z2)

Z3 = W3 dot_product A2 + B3
A3 = tanh_activation(Z3)

Z4 = W4 dot_product A3 + B4
A4 = sigmoid_activation(Z4)
DA4 = -( Y / A4 + (1 - Y /  1 - A4 ) ) ( derivative of output activations or logits w.r.t to loss function )

DZ4 = DA4 * derivative_tanh(Z4) ( derivative of tanh activation, which I assume is ( 1 - (Z4 ) ^ 2 ) )
Dw4 = ( dZ4 dot_produt A3.T ) / total_number_of_samples
Db4 = np.sum(DZ4, axis = 1, keepdims = True ... ) / total_number_of_samples
DA3 = W4.T dot_product(DZ4)


DZ3 = DA3 * derivative_tanh( Z3 )
DW3 = ( DZ3 dot_product A2.T ) / total_number_of_samples
DB3 = np.sum( DZ3, .. ) / total_number_of_samples
DA2 = W3.T dot_product(DZ3)


DZ2 = DA2 * derivative_tanh( Z2 )
DW2 = ( DZ2 dot_product A1.T ) / total_number_of_samples
DB2 = np.sum( DZ2, .. ) / total_number_of_samples
DA1 = W2.T dot_product(DZ2)



DZ1 = DA1 * derivative_tanh( Z1 )
DW1 = ( DZ1 dot_product X.T ) / total_number_of_samples
DB1 = np.sum( DZ1, .. ) / total_number_of_samples

反向传播

Z1 = W1 dot_product X + B1
A1 = tanh_activation(Z1)

Z2 = W2 dot_product A1 + B2
A2 = tanh_activation(Z2)

Z3 = W3 dot_product A2 + B3
A3 = tanh_activation(Z3)

Z4 = W4 dot_product A3 + B4
A4 = sigmoid_activation(Z4)
DA4 = -( Y / A4 + (1 - Y /  1 - A4 ) ) ( derivative of output activations or logits w.r.t to loss function )

DZ4 = DA4 * derivative_tanh(Z4) ( derivative of tanh activation, which I assume is ( 1 - (Z4 ) ^ 2 ) )
Dw4 = ( dZ4 dot_produt A3.T ) / total_number_of_samples
Db4 = np.sum(DZ4, axis = 1, keepdims = True ... ) / total_number_of_samples
DA3 = W4.T dot_product(DZ4)


DZ3 = DA3 * derivative_tanh( Z3 )
DW3 = ( DZ3 dot_product A2.T ) / total_number_of_samples
DB3 = np.sum( DZ3, .. ) / total_number_of_samples
DA2 = W3.T dot_product(DZ3)


DZ2 = DA2 * derivative_tanh( Z2 )
DW2 = ( DZ2 dot_product A1.T ) / total_number_of_samples
DB2 = np.sum( DZ2, .. ) / total_number_of_samples
DA1 = W2.T dot_product(DZ2)



DZ1 = DA1 * derivative_tanh( Z1 )
DW1 = ( DZ1 dot_product X.T ) / total_number_of_samples
DB1 = np.sum( DZ1, .. ) / total_number_of_samples

这是我的tanh实现

def tanh_activation(x):
 return np.tanh(x)


我的tanh衍生工具实现

def derivative_tanh(x):
 return ( 1 - np.power(x,2))
在上述反向传播步骤之后,我使用梯度下降及其各自的导数更新权重和偏差。但是,无论我运行该算法多少次,该模型都无法改善其损失超过
0.69
,并且输出权重的导数(在我的例子中
dW4
)非常低
1e-7
。我假设我的
导数_tanh
函数或我的
dZ
计算真的关闭了,这会导致坏的损耗值传播回网络。请分享您对我的backprop实现是否有效的想法。蒂亚。我通过了


和许多其他博客,但找不到我要找的内容。

我找到了解决问题的方法,并在此处回答:。我建议关闭该线程。

我找到了解决问题的方法,并在此处回答:。我建议把线合上