Tensorflow 为什么我在Kubernetes运行Tesorflow时会遇到崩溃?
这是我的pod.yaml配置Tensorflow 为什么我在Kubernetes运行Tesorflow时会遇到崩溃?,tensorflow,kubernetes,Tensorflow,Kubernetes,这是我的pod.yaml配置 apiVersion: apps/v1 kind: Deployment metadata: name: tensorflow labels: app: tensorflow spec: replicas: 1 selector: matchLabels: app: tensorflow template: metadata: labels: app: tensorflow
apiVersion: apps/v1
kind: Deployment
metadata:
name: tensorflow
labels:
app: tensorflow
spec:
replicas: 1
selector:
matchLabels:
app: tensorflow
template:
metadata:
labels:
app: tensorflow
spec:
containers:
- name: tensorflow
image: tensorflow/tensorflow:latest
ports:
- containerPort: 8888
当我试图创建它时,出现了crashloop错误
有人能帮忙吗?
比如,我做错了什么吗?如果吊舱处于崩溃循环中,这意味着它不断启动和死亡。这意味着您的配置从k8s的角度来看是有效的 如果没有日志,很难说。您能否运行
kubectl description
和kubectl日志
当您运行Tensorflow时-您的容器是否需要GPU支持?如果您要检查
kubectl Descripte pod Tensorflow-************-****
,您将看到最后一个状态是终止,带有退出代码0
,这实际上意味着您的容器已成功启动,完成了任务,也成功地完成了任务
此外,对于部署,默认情况下启用了restartPolicy:Always
,您无法设置restartPolicy:Never
。更多信息请点击此处:
始终意味着容器将重新启动,即使它以零退出代码退出(即成功退出)-这就是为什么您会看到重新启动和CrashLoopBackOff
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 29 Apr 2021 23:44:03 +0000
Finished: Thu, 29 Apr 2021 23:44:03 +0000
Ready: False
Restart Count: 2
您可以添加部署,使tensorflow吊舱无法完成,例如
apiVersion: apps/v1
kind: Deployment
metadata:
name: tensorflow
labels:
app: tensorflow
spec:
replicas: 1
selector:
matchLabels:
app: tensorflow
template:
metadata:
labels:
app: tensorflow
spec:
containers:
- name: tensorflow
image: tensorflow/tensorflow:latest
ports:
- containerPort: 8888
command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
结果:
kubectl get pod tensorflow-788846c588-p64rl
NAME READY STATUS RESTARTS AGE
tensorflow-788846c588-p64rl 1/1 Running 0 4m23s
没有GPU我正在使用Simple尝试运行Tensorflow厌倦了使用Descripte和logs没有得到任何东西来描述我得到了这样的错误:服务器没有资源类型“Tensorflow-858f474789-vfnf8”