Tensorflow 为什么我在Kubernetes运行Tesorflow时会遇到崩溃?

Tensorflow 为什么我在Kubernetes运行Tesorflow时会遇到崩溃?,tensorflow,kubernetes,Tensorflow,Kubernetes,这是我的pod.yaml配置 apiVersion: apps/v1 kind: Deployment metadata: name: tensorflow labels: app: tensorflow spec: replicas: 1 selector: matchLabels: app: tensorflow template: metadata: labels: app: tensorflow

这是我的pod.yaml配置

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorflow
  labels:
    app: tensorflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tensorflow
  template:
    metadata:
      labels:
        app: tensorflow
    spec:
      containers:
      - name: tensorflow
        image: tensorflow/tensorflow:latest
        ports:
        - containerPort: 8888
当我试图创建它时,出现了crashloop错误 有人能帮忙吗?
比如,我做错了什么吗?

如果吊舱处于崩溃循环中,这意味着它不断启动和死亡。这意味着您的配置从k8s的角度来看是有效的

如果没有日志,很难说。您能否运行
kubectl description
kubectl日志


当您运行Tensorflow时-您的容器是否需要GPU支持?

如果您要检查
kubectl Descripte pod Tensorflow-************-****
,您将看到最后一个状态是
终止
,带有
退出代码0
,这实际上意味着您的容器已成功启动,完成了任务,也成功地完成了任务

此外,对于部署,默认情况下启用了
restartPolicy:Always
,您无法设置
restartPolicy:Never
。更多信息请点击此处:

始终意味着容器将重新启动,即使它以零退出代码退出(即成功退出)-这就是为什么您会看到重新启动和CrashLoopBackOff

State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 29 Apr 2021 23:44:03 +0000
      Finished:     Thu, 29 Apr 2021 23:44:03 +0000
    Ready:          False
    Restart Count:  2

您可以添加部署,使tensorflow吊舱无法完成,例如

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorflow
  labels:
    app: tensorflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tensorflow
  template:
    metadata:
      labels:
        app: tensorflow
    spec:
      containers:
      - name: tensorflow
        image: tensorflow/tensorflow:latest
        ports:
        - containerPort: 8888
        command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
结果:

kubectl get pod tensorflow-788846c588-p64rl
NAME                          READY   STATUS    RESTARTS   AGE
tensorflow-788846c588-p64rl   1/1     Running   0          4m23s

没有GPU我正在使用Simple尝试运行Tensorflow厌倦了使用Descripte和logs没有得到任何东西来描述我得到了这样的错误:服务器没有资源类型“Tensorflow-858f474789-vfnf8”