Tensorflow 使用tf.train.MonitoredTrainingSession获得验证丢失的干净方法是什么?
我正在构建一个分布式张量流模型,对于如何以干净的方式使用tf.MonitoredTrainingSession,我有点困惑 这是我的培训代码:Tensorflow 使用tf.train.MonitoredTrainingSession获得验证丢失的干净方法是什么?,tensorflow,distributed,Tensorflow,Distributed,我正在构建一个分布式张量流模型,对于如何以干净的方式使用tf.MonitoredTrainingSession,我有点困惑 这是我的培训代码: #Define number of training steps hooks=[tf.train.StopAtStepHook(last_step=FLAGS.nb_train_step)] with tf.train.MonitoredTrainingSession(master=target, is_chief=(FLAGS.task_in
#Define number of training steps
hooks=[tf.train.StopAtStepHook(last_step=FLAGS.nb_train_step)]
with tf.train.MonitoredTrainingSession(master=target,
is_chief=(FLAGS.task_index == 0),
checkpoint_dir=FLAGS.logs_dir,
hooks = hooks) as sess:
while not sess.should_stop():
batch_train = gen_train.next() #training data generator
feed_dict = {X: batch_train[0],
Y: batch_train[1]}
variables = [loss, merged_summary, train_step]
current_loss, summary, _ = sess.run(variables, feed_dict)
print("Batch loss: %s" % current_loss)
现在,如果我想在每个n
训练步骤中得到我的模型验证损失,我可以在每个n
步骤中添加要评估的块:
batch_val = gen_val.next() #validation data generator
feed_dict = {X: batch_train[0],
Y: batch_train[1]}
val_loss = sess.run([loss],feed_dict)
但这将增加我的钩子中的步骤数,这意味着验证损失计算将被视为一个训练步骤。有没有干净的方法?我是否误解了钩子的作用