Caching 如何理解tensorflow中的缓存机制

Caching 如何理解tensorflow中的缓存机制,caching,tensorflow,cross-device,Caching,Tensorflow,Cross Device,论文:TensorFlow:一个大规模应用系统 机器学习3.3美元表示: We optimized TensorFlow for executing large sub- graphs repeatedly with low latency. Once the graph for a step has been pruned, placed, and partitioned, its sub- graphs are cached in their respective devices. A cl

论文:TensorFlow:一个大规模应用系统 机器学习3.3美元表示:

We optimized TensorFlow for executing large sub- graphs repeatedly with low latency. Once the graph for a step has been pruned, placed, and partitioned, its sub- graphs are cached in their respective devices. A client session maintains the mapping from step definitions to cached subgraphs, so that a distributed step on a large graph can be initiated with one small message to each participating task. This model favours static, reusable graphs, but it can support dynamic computations using dynamic control flow, as the next subsection describes.
  • 如何理解此处的“缓存在各自的设备中”?而且很多API都有“caching_device”参数,但是默认值为False,如何理解缓存特性

  • 一般来说,缓存机制将始终遵循“无效缓存”策略,那么缓存策略如何呢

  • 如果我们对多个GPU使用更多的克隆模型图,并且具有图间并行性,也就是说,更多的模型克隆将引用ps上的共享变量,那么每个克隆如何读取远程变量?默认情况下,它是否在某些本地设备上缓存变量以减少网络通信

  • 更多详情:

    A Tour of TensorFlow
    https://arxiv.org/pdf/1610.01178.pdf
    
    Finally, an important optimization made by TensorFlow at this step is “canonicalization” of (send,receive) pairs. In the setup displayed in Figure 5b, the existence of each recv node on device B would imply allocation and management of a separate buffer to store ν’s output tensor, so that it may then be fed to nodes α and β, respectively. However, an equivalent and more efficient transformation places only one recv node on device B, streams all output from ν to this single node, and then to the two dependent nodes α and β. This last and final evolution is given in Figure 5c.
    
    
    上面的文档描述了如果图5c自动进行优化以减少隐式读取操作。如果这种情况发生在分布式系统中,网络通信量将根据需要自动减少

    另一种方式,/model/slim/deployment/model_deploy.py尝试创建缓存变量,如下所示:

      562   def caching_device(self):
      563     """Returns the device to use for caching variables.
      564
      565     Variables are cached on the worker CPU when using replicas.
      566
      567     Returns:
      568       A device string or None if the variables do not need to be cached.
      569     """
      570     if self._num_ps_tasks > 0:
      571       return lambda op: op.device
      572     else:
      573       return None
    
    我想,是为了使网络流量最优化

    在分布式系统中进行通信优化的真正或最好的方法是什么

    我们也更喜欢关于它的更清晰的说明,如果我得到更多的实验结果,我们将尝试更新这个问题