Tensorflow 在多GPU场景中收集批处理规范化统计信息的正确方法？_Tensorflow

Tensorflow 在多GPU场景中收集批处理规范化统计信息的正确方法？

tensorflow

Tensorflow 在多GPU场景中收集批处理规范化统计信息的正确方法？,tensorflow,Tensorflow,我正在使用tf.contrib.layers.batch\u norm函数在多个GPU中训练BatchNorm层。在训练阶段，我们必须使用函数收集移动平均值和移动方差 update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 但是，我发现函数的使用有一些方法 1.循环函数内部 2.循环函数外部 3.循环函数的内部和外部什么是正确的方法？在我看来，这可能是第二种方法，您为第一种方法提供的代码片段与cifar_10示例不匹配。本例中采用的方

我正在使用

tf.contrib.layers.batch\u norm

函数在多个GPU中训练BatchNorm层。在训练阶段，我们必须使用函数收集移动平均值和移动方差

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

但是，我发现函数的使用有一些方法

1.循环函数内部

2.循环函数外部

3.循环函数的内部和外部

什么是正确的方法？在我看来，这可能是第二种方法，您为第一种方法提供的代码片段与cifar_10示例不匹配。本例中采用的方法仅收集并应用源自第一个塔楼的更新操作，作为启发式代码优化。以下是相关的代码片段：

with tf.variable_scope('resnet', reuse=bool(i != 0)): with tf.name_scope('tower_%d' % i) as name_scope: with tf.device(device_setter): loss, gradvars, preds = _tower_fn( is_training, weight_decay, tower_features[i], tower_labels[i], data_format, params.num_layers, params.batch_norm_decay, params.batch_norm_epsilon) tower_losses.append(loss) tower_gradvars.append(gradvars) tower_preds.append(preds) if i == 0: # Only trigger batch_norm moving mean and variance update from # the 1st tower. Ideally, we should grab the updates from all # towers but these stats accumulate extremely fast so we can # ignore the other stats from the other towers without # significant detriment. update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, name_scope)
请注意，在上面的代码段中，更新操作仅限于通过
name\u作用域
从第一个塔台发出的操作，该作用域作为参数传递给
tf.get\u集合
第二种方法适用于所有塔楼的所有更新操作
正如您所写的，第三种方法是第一种方法的变体。然而，linkedtoinceptionv3文件实际上与cifar10\u主示例类似

关于哪种方法是正确的方法：这取决于。选择性地应用更新操作可能会减少每个训练步骤的时间，同时牺牲（某些定义）正确性，而应用所有更新操作可能会增加每个训练步骤的时间。在实践中，两种方法都可以尝试，看看哪种方法更适合你。
谢谢你的明确解释。所以，我将你们的问题总结为：方法1:BN参数只从第一个塔中获取。方法2和方法3：根据所有塔的BN参数计算BN PRAM。我的工作是分割，因此大批量将是有用的。哪种方法（2或3）可用于更新所有塔的最终BN参数？这意味着最终的BN参数是根据所有塔楼的平均移动平均值和方差计算出来的。如果要收集所有塔楼的更新操作，则第二种方法应该有效。
with tf.device('/cpu:0'): with tf.variable_scope(tf.get_variable_scope()): for i in range(self.conf.num_gpus): with tf.device('/gpu:%d' % i): with tf.name_scope('device_%d' % i): #Igore the line update_ops variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step) variables_averages_op = variable_averages.apply(tf.trainable_variables()) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): self.train_op = tf.group(train_op_conv,variables_averages_op)

with tf.device('/cpu:0'): with tf.variable_scope(tf.get_variable_scope()): for i in range(self.conf.num_gpus): with tf.device('/gpu:%d' % i): with tf.name_scope('device_%d' % i): update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step) variables_averages_op = variable_averages.apply(tf.trainable_variables()) batchnorm_updates_op = tf.group(*update_ops) self.train_op = tf.group(train_op_conv, train_op_fc,variables_averages_op,batchnorm_updates_op)

with tf.variable_scope('resnet', reuse=bool(i != 0)): with tf.name_scope('tower_%d' % i) as name_scope: with tf.device(device_setter): loss, gradvars, preds = _tower_fn( is_training, weight_decay, tower_features[i], tower_labels[i], data_format, params.num_layers, params.batch_norm_decay, params.batch_norm_epsilon) tower_losses.append(loss) tower_gradvars.append(gradvars) tower_preds.append(preds) if i == 0: # Only trigger batch_norm moving mean and variance update from # the 1st tower. Ideally, we should grab the updates from all # towers but these stats accumulate extremely fast so we can # ignore the other stats from the other towers without # significant detriment. update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, name_scope)