Python tf.contrib.eager.list_devices（）在AI平台机器上工作吗？_Python_Tensorflow_Gpu_Google Cloud Ml

Python tf.contrib.eager.list_devices（）在AI平台机器上工作吗？

python tensorflow

Python tf.contrib.eager.list_devices（）在AI平台机器上工作吗？,python,tensorflow,gpu,google-cloud-ml,Python,Tensorflow,Gpu,Google Cloud Ml,我试图在AI平台上设置Tensorflow培训工作，并希望能够动态配置要使用的GPU数量。目前，为了获得GPU的数量，我正在使用： distribution_strategy = None # Get the available GPU devices num_gpus = len([device_name for device_name in tf.contrib.eager.list_devices() if '/device:

我试图在AI平台上设置Tensorflow培训工作，并希望能够动态配置要使用的GPU数量。目前，为了获得GPU的数量，我正在使用：

distribution_strategy = None
# Get the available GPU devices
num_gpus = len([device_name
                for device_name in tf.contrib.eager.list_devices()
                if '/device:GPU' in device_name])
logging.info('%s GPUs are available.', str(num_gpus))
if num_gpus > 1:
  distribution_strategy = tf.distribute.MirroredStrategy()
  logging.info('MirroredStrategy will be used for training.')
  # Update the batch size
  args.batch_size = int(math.ceil(args.batch_size / num_gpus))

但是，这似乎只有在本地运行时才起作用。当我在AI平台上使用workerType

complex\u model\u m\u gpu

运行作业时，为什么这不起作用？我可以使用哪些替代方案来实现预期效果

更多信息：我的配置文件如下所示：

trainingInput:
  scaleTier: CUSTOM
  masterType: large_model
  workerType: complex_model_m_gpu
  parameterServerType: large_model
  workerCount: 3
  parameterServerCount: 1

看起来您没有使用ParameterServer策略为什么要定义ParameterServer体系结构？你说它不起作用是什么意思。您看到了什么？MirroredStrategy通常用于同一主机中的多个GPU，您是否尝试过在“parameterServerType:complex_model_m_v100”中添加GPU，因为当前一个没有GPU：在旧的ParameterServer strategy Master中使用CPU和工作GPU，根据您的代码结构，代码可能仅在Master中运行。看起来您没有使用ParameterServer策略为什么要定义ParameterServer体系结构？你说它不起作用是什么意思。您看到了什么？MirroredStrategy通常用于同一主机中的多个GPU，您是否尝试过在“parameterServerType:complex_model_m_v100”中添加GPU，因为当前的GPU没有GPU：在旧的ParameterServer strategy Master中，根据您的代码结构，Master使用CPU和Workers GPU，代码可能仅在Master中运行。