Python 3.x 如何使用Tensorflow计算每个样本的梯度,并使用任意函数减少它们?
给定一(小)批大小为M的样本,我希望获得M个梯度(每个样本一个),并使用任意函数(而不是典型的平均值)将其简化为单个梯度。然后,我想用最后一个梯度来训练网络 我有一个下面的例子,它在某种程度上是有效的。问题是,随着迭代的进行,它不仅变得越来越慢,而且程序继续使用越来越多的内存。据我所知,这是因为我在每次迭代中向Tensorflow的计算图添加新的操作,但我缺乏Tensorflow的知识,无法避免这样做,也无法实现我的目标Python 3.x 如何使用Tensorflow计算每个样本的梯度,并使用任意函数减少它们?,python-3.x,tensorflow,Python 3.x,Tensorflow,给定一(小)批大小为M的样本,我希望获得M个梯度(每个样本一个),并使用任意函数(而不是典型的平均值)将其简化为单个梯度。然后,我想用最后一个梯度来训练网络 我有一个下面的例子,它在某种程度上是有效的。问题是,随着迭代的进行,它不仅变得越来越慢,而且程序继续使用越来越多的内存。据我所知,这是因为我在每次迭代中向Tensorflow的计算图添加新的操作,但我缺乏Tensorflow的知识,无法避免这样做,也无法实现我的目标 import sys import numpy as np import
import sys
import numpy as np
import tensorflow as tf
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# Parameters
learning_rate = 0.0001
n_iterations = 1000000
batch_size = 2
display_step = 1000
# Network Parameters
n_hidden_1 = 256
n_hidden_2 = 256
n_input = 784
n_output = 10
# Create model
def build_network(x, weights, biases):
assert sys.version_info[:2] >= (3,6) # otherwise iteration on dict is ill-defined
layer = x
for (w_str,w),(b_str,b) in zip(weights.items(), biases.items()):
print(w_str,b_str)
layer = tf.add(tf.matmul(layer, w), b)
return layer
def max_gradient(gradients):
assert len(gradients) >= 1
# compute the resulting gradient, by choosing the (abs) max component wise
tgv = gradients[0]
for gv in gradients[1:]:
for (tg,tv),(g,v) in zip(tgv,gv):
assert (tv == v).all()
np.copyto(tg,g,where=abs(g) > abs(tg))
return tgv
def main(sess):
# tf Graph input/output
X = tf.placeholder('float', [None, n_input])
Y = tf.placeholder('float', [None, n_output])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, n_output]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_output]))
}
# Construct model
model = build_network(X, weights, biases)
# Loss
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model, labels=Y))
# Optimizer
opt = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
# Gradient
gv_op = opt.compute_gradients(loss_op)
# Trainable variables
t_vars = tf.trainable_variables()
# Initializing the variables
sess.run(tf.global_variables_initializer())
# Training cycle
for i in range(n_iterations):
# Get new training data
X_train, Y_train = mnist.train.next_batch(batch_size)
# Run the cost operation (to get the loss value)
c = sess.run(loss_op, feed_dict={X: X_train, Y: Y_train})
print(i, c)
# get the gradients for each batch sample
gradients = [sess.run(gv_op,
feed_dict={X: X_train[[j]], Y: Y_train[[j]]})
for j in range(batch_size)]
# compute the resulting gradient
tgv = max_gradient(gradients)
# assert that all the variables match
for (_,v1),v2 in zip(tgv,t_vars):
assert (v1 == sess.run(v2)).all()
# place the actual variables for the variable slots
tgv = [(g,v) for (g,_),v in zip(tgv, t_vars)]
# apply the transformation
sess.run(opt.apply_gradients(tgv))
if __name__ == '__main__':
with tf.Session() as sess:
main(sess)
如何计算每个样本的梯度并使用任意函数减少它们,而不不断地向Tensorflow的图中添加新的操作