使用tensorflow批量标准化暂停减少损失_Tensorflow_Conv Neural Network

使用tensorflow批量标准化暂停减少损失

tensorflow

使用tensorflow批量标准化暂停减少损失,tensorflow,conv-neural-network,Tensorflow,Conv Neural Network,我使用的是tensorflow版本r0.11 我试图在conv网络中使用批处理规范化（tf.contrib.layers.batch_norm（））。作为一名创业者，我遵循了在以下github中进行的讨论。“is_training”、“reuse”和“updates_collections”标志（在使用中）似乎仍然令人困惑，部分原因是缺乏良好的用例。然而，我的问题是，如果我添加批处理规范层，损失并没有减少我按照CIFAR中的结构构建代码。我正在以多gpu的方式运行它（用于培训）。我有一个用于培

我使用的是tensorflow版本r0.11 我试图在conv网络中使用批处理规范化（tf.contrib.layers.batch_norm（））。作为一名创业者，我遵循了在以下github中进行的讨论。“is_training”、“reuse”和“updates_collections”标志（在使用中）似乎仍然令人困惑，部分原因是缺乏良好的用例。然而，我的问题是，如果我添加批处理规范层，损失并没有减少

我按照CIFAR中的结构构建代码。我正在以多gpu的方式运行它（用于培训）。我有一个用于培训的脚本（类似于cifar10_multigpu.py）和一个用于测试的脚本（类似于cifar10_eval.py）

推理/模型构建发生在函数损失中（嵌套函数）。（下面是一个函数示例，实际上我使用了更多的层和神经元）

我想执行批量正常化。因此，我在我的“\u tower\u loss”以及“推断”函数中传入了一个额外的占位符输入参数

def inference(inputs, is_training):
    # BN1
    with tf.variable_scope('norm0') as scope:
        # Note that I'm using the dafault for 'updates_collections'
        # which is None
        norm0 = tf.contrib.layers.batch_norm(inputs, is_training=is_training,
                scope=scope, reuse=None)

    # conv1
    with tf.variable_scope('conv1') as scope:
        kernel = # define kernel
        conv = tf.nn.conv2d(norm0, kernel, strides=[1, 1, 1, 1], padding='SAME')
        # Rest is same

我还在两个fc层中添加了规范化层

列车代码中的说明如下所示

当批处理规范化不存在时，A行不存在，B行中的“更新操作”在会话中运行

我所看到的是，当不使用批处理规范化时，损失从aorund 6.5开始，并持续减少到接近0，但当我使用批处理规范化时，损失在2或300次（小批量）迭代后并没有减少，并停留在5.5左右。速度方面，我认为性能是一样的。我不确定是什么问题。我尝试了不同的学习率（我使用的是Adam optimizer），但没有效果。我不确定“变量平均值”和“更新操作”是否把事情搞砸了。任何帮助都将不胜感激。

希望这有帮助：希望这有帮助：

def inference(inputs): #(This gets called from _tower_loss())
    # conv1
    with tf.variable_scope('conv1') as scope:
        kernel = # define kernel
        conv = tf.nn.conv2d(inputs, kernel, strides=[1, 1, 1, 1], padding='SAME')
        biases = _variable_on_gpu('biases', [64], tf.constant_initializer(0.0))
        preactivation = tf.nn.bias_add(conv, biases)
        # ReLU.
        conv1 = tf.nn.relu(preactivation, name=scope.name)

    # pool1
    pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1],
                           strides=[1, 2, 2, 1], padding='SAME', name='pool1')

    # Similarly more conv+pool and then fcs and finally logits
    return logits

def inference(inputs, is_training):
    # BN1
    with tf.variable_scope('norm0') as scope:
        # Note that I'm using the dafault for 'updates_collections'
        # which is None
        norm0 = tf.contrib.layers.batch_norm(inputs, is_training=is_training,
                scope=scope, reuse=None)

    # conv1
    with tf.variable_scope('conv1') as scope:
        kernel = # define kernel
        conv = tf.nn.conv2d(norm0, kernel, strides=[1, 1, 1, 1], padding='SAME')
        # Rest is same

variable_averages = tf.train.ExponentialMovingAverage(0.9999, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())

train_op = tf.group(apply_gradient_op, variables_averages_op)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) # Line A

sess.run([train_op, loss, update_ops],feed_dict={is_training: True}) # Line B