Machine learning 梯度下降：增量值应该是标量还是向量？_Machine Learning_Neural Network_Gradient Descent

Machine learning 梯度下降：增量值应该是标量还是向量？

machine-learning neural-network

Machine learning 梯度下降：增量值应该是标量还是向量？,machine-learning,neural-network,gradient-descent,Machine Learning,Neural Network,Gradient Descent,在运行反向传播后计算神经网络的增量值时： delta（1）的值将是标量值，它应该是向量吗更新：取自具体来说：首先，您可能了解到，在每个层中，我们都需要学习参数（或权重），以便形成二维矩阵 n is the number of nodes in the current layer plus 1 (for bias) m is the number of nodes in the previous layer. 我们有nxm参数，因为前一层和当前层之间的两个节点中的任何一个都有一个连接我

在运行反向传播后计算神经网络的增量值时：

delta（1）的值将是标量值，它应该是向量吗

更新：

取自

具体来说：

首先，您可能了解到，在每个层中，我们都需要学习参数（或权重），以便形成二维矩阵

n is the number of nodes in the current layer plus 1 (for bias)
m is the number of nodes in the previous layer.

我们有

nxm

参数，因为前一层和当前层之间的两个节点中的任何一个都有一个连接

我很确定L层的Delta（大Delta）是用来为L层的每个参数积累偏导数项的。所以在每一层都有一个二维的Delta矩阵。要更新矩阵的第i行（当前层中的第i个节点）和第j列（前一层中的第j个节点）

D_(i,j) = D_(i,j) + a_j * delta_i
note a_j is the activation from the j-th node in previous layer,
     delta_i is the error of the i-th node of the current layer
so we accumulate the error proportional to their activation weight.

因此，为了回答您的问题，Delta应该是一个矩阵

有公式参考吗？@greeness请参阅updatethanks，但我的问题是，为什么输出的是标量而不是矩阵，因为错误*（a）转置是标量。也许我指的链接不正确？错误是nx1，a的转置是1xm，所以产品是nxm。您可能使用（1xn）x（nx1）进行计算，因此它成为标量。