Tensorflow 在GKE上使用TPU:进料记录错误:插座关闭
偶尔,我们使用TPU的基于GKE TPUEstimator的培训工作会失败:Tensorflow 在GKE上使用TPU:进料记录错误:插座关闭,tensorflow,google-kubernetes-engine,tpu,Tensorflow,Google Kubernetes Engine,Tpu,偶尔,我们使用TPU的基于GKE TPUEstimator的培训工作会失败: Error recorded from infeed: Socket closed An error was raised. This may be due to a preemption in a connected worker or parameter server. The current session will be closed and a new session will be created. Thi
Error recorded from infeed: Socket closed
An error was raised. This may be due to a preemption in a connected worker or parameter server. The current session will be closed and a new session will be created. This error may also occur due to a gRPC failure caused by high memory or network bandwidth usage in the parameter servers. If this error occurs repeatedly, try increasing the number of parameter servers assigned to the job. Error: Socket closed
关于这一点,我有两个问题: