TensorFlow Slim多GPU训练_Tensorflow_Gpu_Multi Gpu_Tf Slim

TensorFlow Slim多GPU训练

tensorflow

TensorFlow Slim多GPU训练,tensorflow,gpu,multi-gpu,tf-slim,Tensorflow,Gpu,Multi Gpu,Tf Slim,我用的是TensorFlow Slim。我的目标是在多GPU模式下运行给定的标准脚本（位于/models/slim/scripts中）。我已经在_flowers.sh脚本上测试了finetune_resnet_v1_50_（于2017年4月12日克隆）。我刚刚在培训部分的末尾添加了--num_clones=2（受/slim/deployment/model_deploy_test.py和以前的StackOverflow答案的启发）：来自部署/model_deploy_test.py的代码： d

我用的是TensorFlow Slim。我的目标是在多GPU模式下运行给定的标准脚本（位于/models/slim/scripts中）。我已经在_flowers.sh脚本上测试了finetune_resnet_v1_50_（于2017年4月12日克隆）。我刚刚在培训部分的末尾添加了--num_clones=2（受/slim/deployment/model_deploy_test.py和以前的StackOverflow答案的启发）：

来自部署/model_deploy_test.py的代码：

def testMultiGPU(self):
    deploy_config = model_deploy.DeploymentConfig(num_clones=2)

我得到一个警告（“忽略设备规范”）：

GPU正常运行（内存使用率和GPU使用率），但与单个GPU培训相比，培训速度并不快

这个问题可能与以下方面有关：

我很高兴收到你对这个问题的答复、意见或具体建议

CUDA版本：8.0版、V8.0.53版

TensorFlow安装于二进制测试版本：1.0.1和1.1.0rc

GPU:NVIDIA Tesla P100（SXM2）

即使这个答案可能会迟交，培训也不应该更快（以每步秒为单位）。现在创建了另一个模型，通过您的参数，有效批量为64，因此您可以将最大步骤数减半。

请遵循本文档为了确保变量存储在CPU中，我们需要使用上下文管理器使用

slim.arg_作用域（[slim.model_变量，slim.variable]，device='/cpu:0'）：

它解决了我的问题。

是否使用两个批大小为32的gpu与一个批大小为64的gpu相同？否，因为您需要调整学习率和学习率衰减。你也需要适应每个衰变的历元数。我的意思是，如果我们只考虑批次大小。我还发现，使用多GPU时，速度越来越慢。在这之前，我认为浴缸的大小将除以GPU的数量，而且速度越来越快。据我所知，你的评论是成倍的批量大小。我说得对吗？

def testMultiGPU(self):
    deploy_config = model_deploy.DeploymentConfig(num_clones=2)

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-SXM2-16GB, pci bus id: 0000:86:00.0)
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:1 for node 'clone_1/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:0 for node 'clone_0/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0