Python 以下Theano方法中更新参数的方式是否存在错误？_Python_Theano_Deep Learning

Python 以下Theano方法中更新参数的方式是否存在错误？

python deep-learning

Python 以下Theano方法中更新参数的方式是否存在错误？,python,theano,deep-learning,Python,Theano,Deep Learning,我正在网上阅读一篇关于动量学习的教程，在Theano中偶然发现了这种方法 def gradient_updates_momentum(cost, params, learning_rate, momentum): ''' Compute updates for gradient descent with momentum :parameters: - cost : theano.tensor.var.TensorVariable Theano cost func

我正在网上阅读一篇关于动量学习的教程，在Theano中偶然发现了这种方法

def gradient_updates_momentum(cost, params, learning_rate, momentum):
    '''
Compute updates for gradient descent with momentum

:parameters:
    - cost : theano.tensor.var.TensorVariable
        Theano cost function to minimize
    - params : list of theano.tensor.var.TensorVariable
        Parameters to compute gradient against
    - learning_rate : float
        Gradient descent learning rate
    - momentum : float
        Momentum parameter, should be at least 0 (standard gradient descent) and less than 1

:returns:
    updates : list
        List of updates, one for each parameter
'''
# Make sure momentum is a sane value
assert momentum < 1 and momentum >= 0
# List of update steps for each parameter
updates = []
# Just gradient descent on cost
for param in params:
    # For each parameter, we'll create a param_update shared variable.
    # This variable will keep track of the parameter's update step across iterations.
    # We initialize it to 0
    param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
    # Each parameter is updated by taking a step in the direction of the gradient.
    # However, we also "mix in" the previous step according to the given momentum value.
    # Note that when updating param_update, we are using its old value and also the new gradient step.
    updates.append((param, param - learning_rate*param_update))
    # Note that we don't need to derive backpropagation to compute updates - just use T.grad!
    updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
return updates

及

我知道在执行了train方法并计算了成本之后，才运行更新，对吗

这是否意味着我们应该使用当前成本，并且使用现有的参数更新值（来自上一次迭代），我们应该计算更新的参数更新，从而更新当前的参数值

为什么相反？为什么正确？

提供给

的更新列表中的更新顺序将被忽略。始终使用共享变量的旧值计算更新
此代码片段显示更新顺序被忽略：
import theano
import theano.tensor

p = 0.5
param = theano.shared(1.)
param_update = theano.shared(2.)
cost = 3 * param * param
update_a = (param, param - param_update)
update_b = (param_update, p * param_update + (1 - p) * theano.grad(cost, param))
updates1 = [update_a, update_b]
updates2 = [update_b, update_a]
f1 = theano.function([], outputs=[param, param_update], updates=updates1)
f2 = theano.function([], outputs=[param, param_update], updates=updates2)
print f1(), f1()
param.set_value(1)
param_update.set_value(2)
print f2(), f2()

从逻辑上讲，如果你想
new_a = old_a + a_update
new_b = new_a + b_update

然后，您需要提供如下更新：
new_a = old_a + a_update
new_b = old_a + a_update + b_update

谢谢你，丹尼尔。这是一段很好的代码来解释这个概念。我认为作者当时是这样写的，因为取上一个值可能不会对算法有太大的改变。
new_a = old_a + a_update
new_b = new_a + b_update

new_a = old_a + a_update
new_b = old_a + a_update + b_update