Python tf.gradients()求y的和,是吗?
在tf.gradients(ys,xs)的文档中,它指出 在xs中构造ysw.r.t.x和的符号导数 我对求和部分感到困惑,我在别处读到过,这是对批次中每x的批次导数dy/dx求和。然而,每当我使用它时,我都看不到这种情况发生。以下面的简单示例为例:Python tf.gradients()求y的和,是吗?,python,python-3.x,tensorflow,machine-learning,gradient-descent,Python,Python 3.x,Tensorflow,Machine Learning,Gradient Descent,在tf.gradients(ys,xs)的文档中,它指出 在xs中构造ysw.r.t.x和的符号导数 我对求和部分感到困惑,我在别处读到过,这是对批次中每x的批次导数dy/dx求和。然而,每当我使用它时,我都看不到这种情况发生。以下面的简单示例为例: x_dims = 3 batch_size = 4 x = tf.placeholder(tf.float32, (None, x_dims)) y = 2*(x**2) grads = tf.gradients(y,x) sess = t
x_dims = 3
batch_size = 4
x = tf.placeholder(tf.float32, (None, x_dims))
y = 2*(x**2)
grads = tf.gradients(y,x)
sess = tf.Session()
x_val = np.random.randint(0, 10, (batch_size, x_dims))
y_val, grads_val = sess.run([y, grads], {x:x_val})
print('x = \n', x_val)
print('y = \n', y_val)
print('dy/dx = \n', grads_val[0])
这将提供以下输出:
x =
[[5 3 7]
[2 2 5]
[7 5 0]
[3 7 6]]
y =
[[50. 18. 98.]
[ 8. 8. 50.]
[98. 50. 0.]
[18. 98. 72.]]
dy/dx =
[[20. 12. 28.]
[ 8. 8. 20.]
[28. 20. 0.]
[12. 28. 24.]]
x.shape = (4, 3)
x =
[[1 4 8]
[0 2 8]
[2 8 1]
[4 5 2]]
y.shape = (4, 3)
y =
[[ 2. 32. 128.]
[ 0. 8. 128.]
[ 8. 128. 2.]
[ 32. 50. 8.]]
z.shape = (2, 4, 3)
z =
[[[ 2. 32. 128.]
[ 0. 8. 128.]
[ 8. 128. 2.]
[ 32. 50. 8.]]
[[ 2. 32. 128.]
[ 0. 8. 128.]
[ 8. 128. 2.]
[ 32. 50. 8.]]]
dy/dx =
[[ 4. 16. 32.]
[ 0. 8. 32.]
[ 8. 32. 4.]
[16. 20. 8.]]
dz/dx =
[[ 8. 32. 64.]
[ 0. 16. 64.]
[16. 64. 8.]
[32. 40. 16.]]
这是我期望的输出,只是批次中每个元素的导数dy/dx。我看不出有什么事情发生。我在其他示例中看到,此操作之后除以批次大小,以说明tf.gradients()对批次上的梯度求和(请参见此处:)。为什么这是必要的
我使用的是Tensorflow 1.6和Python 3。如果y和x具有相同的形状,那么dy/dx上的和就是正好一个值上的和。但是,如果每个x有多个y,则会对渐变求和
import numpy as np
import tensorflow as tf
x_dims = 3
batch_size = 4
x = tf.placeholder(tf.float32, (None, x_dims))
y = 2*(x**2)
z = tf.stack([y, y]) # There are twice as many z's as x's
dy_dx = tf.gradients(y,x)
dz_dx = tf.gradients(z,x)
sess = tf.Session()
x_val = np.random.randint(0, 10, (batch_size, x_dims))
y_val, z_val, dy_dx_val, dz_dx_val = sess.run([y, z, dy_dx, dz_dx], {x:x_val})
print('x.shape =', x_val.shape)
print('x = \n', x_val)
print('y.shape = ', y_val.shape)
print('y = \n', y_val)
print('z.shape = ', z_val.shape)
print('z = \n', z_val)
print('dy/dx = \n', dy_dx_val[0])
print('dz/dx = \n', dz_dx_val[0])
生成以下输出:
x =
[[5 3 7]
[2 2 5]
[7 5 0]
[3 7 6]]
y =
[[50. 18. 98.]
[ 8. 8. 50.]
[98. 50. 0.]
[18. 98. 72.]]
dy/dx =
[[20. 12. 28.]
[ 8. 8. 20.]
[28. 20. 0.]
[12. 28. 24.]]
x.shape = (4, 3)
x =
[[1 4 8]
[0 2 8]
[2 8 1]
[4 5 2]]
y.shape = (4, 3)
y =
[[ 2. 32. 128.]
[ 0. 8. 128.]
[ 8. 128. 2.]
[ 32. 50. 8.]]
z.shape = (2, 4, 3)
z =
[[[ 2. 32. 128.]
[ 0. 8. 128.]
[ 8. 128. 2.]
[ 32. 50. 8.]]
[[ 2. 32. 128.]
[ 0. 8. 128.]
[ 8. 128. 2.]
[ 32. 50. 8.]]]
dy/dx =
[[ 4. 16. 32.]
[ 0. 8. 32.]
[ 8. 32. 4.]
[16. 20. 8.]]
dz/dx =
[[ 8. 32. 64.]
[ 0. 16. 64.]
[16. 64. 8.]
[32. 40. 16.]]
特别要注意的是,dz/dx的值是dy/dz的两倍,因为它们是在堆栈的输入上求和的。为了更深入地了解为什么在DDPG等方法中会看到在批量大小上求和的梯度:这些梯度不是通过已经考虑到这一点的损失函数来计算梯度的(如
tf.reduce_mean(…)
)。梯度已通过tf.gradients
求和,因此除以批次大小,在应用apply_gradients