Python 如何在Tensorflow多GPU情况下使用feed_dict_Python_Tensorflow_Distributed

Python 如何在Tensorflow多GPU情况下使用feed_dict

python tensorflow

Python 如何在Tensorflow多GPU情况下使用feed_dict,python,tensorflow,distributed,Python,Tensorflow,Distributed,最近，我尝试学习如何在多个GPU上使用Tensorflow来加快训练速度。我发现了一个关于基于Cifar10数据集的训练分类模型的官方教程。然而，我发现本教程使用队列读取图像。出于好奇，如何通过将值输入会话来使用多个GPU？似乎我很难解决从同一数据集向不同GPU提供不同值的问题。谢谢大家！下面的代码是关于官方教程的一部分 images, labels = cifar10.distorted_inputs() batch_queue = tf.contrib.slim.prefetch_queue

最近，我尝试学习如何在多个GPU上使用Tensorflow来加快训练速度。我发现了一个关于基于Cifar10数据集的训练分类模型的官方教程。然而，我发现本教程使用队列读取图像。出于好奇，如何通过将值输入会话来使用多个GPU？似乎我很难解决从同一数据集向不同GPU提供不同值的问题。谢谢大家！下面的代码是关于官方教程的一部分

images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
      [images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
  for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
      with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
        # Dequeues one batch for the GPU
        image_batch, label_batch = batch_queue.dequeue()
        # Calculate the loss for one tower of the CIFAR model. This function
        # constructs the entire CIFAR model but shares the variables across
        # all towers.
        loss = tower_loss(scope, image_batch, label_batch)

        # Reuse variables for the next tower.
        tf.get_variable_scope().reuse_variables()

        # Retain the summaries from the final tower.
        summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)

        # Calculate the gradients for the batch of data on this CIFAR tower.
        grads = opt.compute_gradients(loss)

        # Keep track of the gradients across all towers.
        tower_grads.append(grads)

而基于队列的API则相对过时，Tensorflow中明确提到了这一点：

使用基于队列的API的输入管道可以干净地替换为

tf.data

API

因此，建议使用

tf.data

API。它针对多GPU和TPU目的进行了优化

如何使用它

您可以使用或更容易地使用estimator API为每个GPU创建多个迭代器

有关完整的教程，请参见。

多GPU示例的核心思想是将操作显式分配给

tf.device

。该示例在

FLAGS.num_gpu

设备上循环，并为每个gpu创建一个副本

如果在for循环中创建占位符ops，它们将被分配到各自的设备。您所需要做的就是保留已创建占位符的句柄，然后在单个

会话中独立地将它们全部输入。运行调用
placeholders = []
for i in range(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
        plc = tf.placeholder(tf.int32) 
        placeholders.append(plc)

with tf.Session() as sess:
    fd = {plc: i for i, plc in enumerate(placeholders)}
    sess.run(sum(placeholders), feed_dict=fd)  # this should give you the sum of all
                                               # numbers from 0 to FLAGS.num_gpus - 1

为了解决您的具体示例，只需将batch\u queue.dequeue（）
调用替换为两个占位符（用于image\u batch
和label\u batch
张量），将这些占位符存储在某个位置，然后将需要的值提供给这些占位符即可
另一种（有点老套）方法是在会话中直接覆盖图像批
和标签批
张量。运行
调用，因为您可以输入任何张量（而不仅仅是占位符）。您仍然需要将张量存储在某个位置，以便能够从run
调用中引用它们
 首先感谢您耐心的解释。但在会话开始后，我仍然对代码感到困惑。占位符和的含义是什么？这只是一个如何引用占位符的示例。在您的情况下，您可以使用session.run获取不同的值（例如培训操作），但要以上述方式提供提要dict。感谢您的详细解释。我想我明白你的意思了。我还有一个问题。看起来，有时我们神经网络的输入值并不局限于训练和测试数据。例如，在生成性对抗网络框架中，我们还需要为多个GPU提供不同的Z（高斯噪声）。我也可以使用tf.data API来做这件事，或者我应该自己编写迭代器？是的，你可以。tf.data.Dataset.from_tensor_切片（tf.random_uniform（[total_training_samples，seq_length，z_dim]，minval=0，maxval=1，dtype=tf.float32]），非常感谢！顺便问一下，如果我基于多个GPU训练我的模型，我需要设置几个输入迭代器吗？看看Google@mrry的回答：这是我的荣幸。
placeholders = []
for i in range(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
        plc = tf.placeholder(tf.int32) 
        placeholders.append(plc)

with tf.Session() as sess:
    fd = {plc: i for i, plc in enumerate(placeholders)}
    sess.run(sum(placeholders), feed_dict=fd)  # this should give you the sum of all
                                               # numbers from 0 to FLAGS.num_gpus - 1