Parameters 张量流估计与数据并行_Parameters_Tensorflow_Server_Distributed_Worker

Parameters 张量流估计与数据并行

parameters tensorflow server

Parameters 张量流估计与数据并行,parameters,tensorflow,server,distributed,worker,Parameters,Tensorflow,Server,Distributed,Worker,我很难理解如何将分布式tensorflow用于estimator（tensorflow/contrib/learn/python/learn/estimators/estimator.py）我的非分布式实现版本基于learn_runner、Experience和estimator（github.com/google/seq2seq）。这就是为什么我尝试使用ClusterConfig和TF_CONFIG，而不是使用ClusterSpec，后者在Tensorflow的官方教程中使用在调用run_

我很难理解如何将分布式tensorflow用于estimator（tensorflow/contrib/learn/python/learn/estimators/estimator.py）

我的非分布式实现版本基于learn_runner、Experience和estimator（github.com/google/seq2seq）。这就是为什么我尝试使用ClusterConfig和TF_CONFIG，而不是使用ClusterSpec，后者在Tensorflow的官方教程中使用

在调用run_config.RunConfig（…）之前，我添加了以下代码：

ps_hosts = FLAGS.ps_hosts.split(",")
worker_hosts = FLAGS.worker_hosts.split(",")
cluster = {"ps": ps_hosts, "worker": worker_hosts}
os.environ['TF_CONFIG'] = json.dumps({
    'cluster': cluster,
    'task': {
        'type': FLAGS.job_name,
        'index': FLAGS.task_index
    }
})

我的问题是关于参数服务器的。我注意到，我以工作名称“ps”启动的流程也表现为一名工人，并进行培训。我查看了代码（和），似乎估计器的实现实际上允许参数服务器成为另一个工作者

是真的这样还是我遗漏了什么

还有一个问题是，我尝试使用SyncReplicasOptimizer进行同步分布式学习，而且由于实验+估计器包装，要遵循Tensorflow也相当困难。有谁对此有更好的想法吗

我非常感谢你的帮助

我发现了问题。Tensorflow实验根据ClusterConfig确定要调度的内容。因此，当您将“-schedule”标记用作分布式Tensorflow时，不应将其作为train或任何其他值传递。