Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Tensorflow t GCMLE上的预估器“;内部错误:作业“;大师;未在群集中定义;_Tensorflow_Google Cloud Platform_Google Cloud Ml - Fatal编程技术网

Tensorflow t GCMLE上的预估器“;内部错误:作业“;大师;未在群集中定义;

Tensorflow t GCMLE上的预估器“;内部错误:作业“;大师;未在群集中定义;,tensorflow,google-cloud-platform,google-cloud-ml,Tensorflow,Google Cloud Platform,Google Cloud Ml,我正试图使用GCMLE和TF版本1.8,并遵循相关说明 对于TF 1.8,他们说: tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver( FLAGS.tpu, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project) config = tpu_config.RunConfig( cluster=tpu_cluster_resolver, model_di

我正试图使用GCMLE和TF版本1.8,并遵循相关说明

对于TF 1.8,他们说:

tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(
FLAGS.tpu,
zone=FLAGS.tpu_zone,
project=FLAGS.gcp_project)

config = tpu_config.RunConfig(
    cluster=tpu_cluster_resolver,
    model_dir=FLAGS.model_dir,
    save_checkpoints_steps=max(600, FLAGS.iterations_per_loop),
    tpu_config=tpu_config.TPUConfig(
        iterations_per_loop=FLAGS.iterations_per_loop,
        num_shards=FLAGS.num_cores))
然后我将其传递给TPUEstimator/train_and_evaluate(),如下所示:

estimator = tpu_estimator.TPUEstimator(
    use_tpu=True,
    model_fn=model_fn,
    config=run_config,
    params = params,
    train_batch_size = params.train_batch_size,
    eval_batch_size = params.eval_batch_size,
    )

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
对于1.7,他们分别表示使用带有“主”标志的不同配置。但是,当我在GCMLE上运行上述1.8指令,并将
--runtime_version
设置为1.8时,我得到以下回溯错误,这表明
内部错误:集群中未定义作业“master”

Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 585, in <module> run_experiment(params) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 127, in run_experiment tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 439, in train_and_evaluate executor.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 546, in run getattr(self, task_to_run)() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 601, in run_master self._start_distributed_training(saving_listeners=saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 719, in _start_distributed_training self._start_std_server(config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 707, in _start_std_server start=False) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/server_lib.py", line 147, in __init__ self._server_def.SerializeToString(), status) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__ c_api.TF_GetCode(self.status.status)) InternalError: Job "master" was not defined in cluster
Traceback(最近一次调用最后一次):文件“/usr/lib/python2.7/runpy.py”,第174行,在运行模块中作为主“/uuuuu main”,fname,loader,pkg_name)文件“/usr/lib/python2.7/runpy.py”,第72行,在运行全局文件“/root/.local/lib/python2.7/site packages/task.py”,第585行,在运行实验(params)文件中“/root/.local/lib/python2.7/site packages/trainer/task.py”,第127行,在train\u实验tf.estimator.train\u和\u evaluate(estimator,train\u spec,eval\u spec)文件中“/usr/local/lib/python2.7/dist packages/tensorflow/python/estimator/training.py”,第439行,在train\u和\u evaluate executor.run()文件中”/usr/local/lib/python2.7/dist packages/tensorflow/python/estimator/training.py”,第546行,运行getattr(self,task_to_run)()文件“/usr/local/lib/python2.7/dist packages/tensorflow/python/estimator/training.py”,第601行,运行_master self.\u启动_分布式_培训(保存侦听器=保存侦听器)文件”/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py”,第719行,在“开始”分布式“培训自我”文件中,在“开始”std\u-server(配置)文件中“/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py”,第707行,在“开始”std\u-server-start=False)文件中”/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/server_-lib.py”,第147行,在uuu-init_uuu-self.u-server_-def.SerializeToString(),status)文件“/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py”,第519行,在u-exit_u-api.TF-GetCode(self.status)内部错误:作业“主“未在群集中定义

这令人困惑,因为文档中说不要使用
master
,所以我不确定出了什么问题?

TPUEstimator是使用estimator.train(input\fn=train\u input\u fn,max\u steps=next\u checkpoint)和estimator.evaluate()调用的


tf.estimator.train_和evaluate(estimator,train_spec,eval_spec)仅在CPU/GPU上工作。

指南已更新[1]。现在只有Cloud ML引擎运行时版本1.8到1.9可用。这些更新版本和最新的说明是否仍然存在问题?[1]