Kubernetes kube系统容器持续崩溃
我在主节点上用Kubernetes kube系统容器持续崩溃,kubernetes,Kubernetes,我在主节点上用kubeadm init--pod network cidr=10.1.0.0/16初始化一个新集群,然后安装Calico,一切都正常: sysadm@master$ sudo kubectl get pods --all-namespaces -o wide [sudo] password for sysadm: NAMESPACE NAME READY STATUS REST
kubeadm init--pod network cidr=10.1.0.0/16
初始化一个新集群,然后安装Calico,一切都正常:
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
[sudo] password for sysadm:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 4m9s 192.168.0.249 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 1/1 Running 0 4m9s 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 1/1 Running 0 4m9s 10.1.0.3 localhost.localdomain <none> <none>
kube-system etcd-localhost.localdomain 1/1 Running 0 3m4s 192.168.0.249 localhost.localdomain <none> <none>
kube-system kube-apiserver-localhost.localdomain 1/1 Running 0 3m18s 192.168.0.249 localhost.localdomain <none> <none>
kube-system kube-controller-manager-localhost.localdomain 1/1 Running 0 3m23s 192.168.0.249 localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 4m9s 192.168.0.249 localhost.localdomain <none> <none>
kube-system kube-scheduler-localhost.localdomain 1/1 Running 0 3m11s 192.168.0.249 localhost.localdomain <none> <none>
你知道会发生什么吗?如何对此进行故障排除?我尝试使用kubectl描述吊舱
,但吊舱一直在崩溃,当我能够获得一些信息时,我看不到任何东西可以引导我下一步去哪里调查
很抱歉,细节含糊不清。如果你能告诉我还有什么地方可以看,我可以发布更多的细节或者知道下一步要调查的地方
感谢您抽出时间:)问题在于主机名。检查NODENAME列。它将主机名显示为localhost.localdomain
将主机名更新为k8s master或master。它应该会起作用。每个节点还应该有一个唯一的主机名,如node1、node2、node3等控制平面吊舱的名称非常可疑,如
kube system etcd localhost.localdomain
,因为该主机的名称实际上不是localhost.localdomain
;但是如果没有这些事故的日志,没有人会祈祷帮助you@MatthewLDaniel谢谢你指出这些事情。我怎样才能获得有用的日志?我正试图找出解决这个问题的下一步,但我不熟悉与kubernetes相关的工具;但是老实说,如果你对kubernetes和docker那么陌生,那么我实际上不会尝试拯救这个集群,因为认为它的名字是localhost.localdomain
的节点病得很厉害。重新开始,使用EKS或GKE或Rancher或其他工具创建群集。@MatthewLDaniel我以前在ECSs和vSphere上安装过kubernetes群集,但没有遇到此问题。新的需求是使用ProxMox。我已经吹走这些节点多次,但同样的问题回来了。除了kubectl get
和kubectl description
我不知道当吊舱如此迅速地被销毁和重建时,我还能用什么来获取日志。您关于etcd localhost.localdomain
的提示是非常有用的线索。我将对此进行更深入的调查。如果etcd localhost.localdomain
可疑,它通常应该是什么样子?@MatthewLDaniel是etcd localhost.localdomain
从网络或网络DNS服务器获取的东西?很抱歉,我不了解环境的其他部分,所以我正在尝试提出一个足够体面的问题来询问网络人员。谢谢,我会在早上的第一件事就是尝试一下,然后再报告。谢谢你指出主机名的问题。或者解决了这个问题。
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 10m 192.168.0.182 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 0/1 CrashLoopBackOff 2 10m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 0/1 CrashLoopBackOff 1 10m 10.1.0.3 localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 10m 192.168.0.166 localhost.localdomain <none> <none>
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 11m 192.168.0.166 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 0/1 CrashLoopBackOff 2 11m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 0/1 CrashLoopBackOff 2 11m 10.1.0.3 localhost.localdomain <none> <none>
kube-system etcd-localhost.localdomain 0/1 Pending 0 1s <none> localhost.localdomain <none> <none>
kube-system kube-apiserver-localhost.localdomain 0/1 Pending 0 1s <none> localhost.localdomain <none> <none>
kube-system kube-controller-manager-localhost.localdomain 0/1 Pending 0 1s <none> localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 11m 192.168.0.249 localhost.localdomain <none> <none>
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 11m 192.168.0.182 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 0/1 Running 3 11m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 0/1 Running 2 11m 10.1.0.3 localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 11m 192.168.0.166 localhost.localdomain <none> <none>
kube-system kube-scheduler-localhost.localdomain 0/1 Pending 0 0s <none> localhost.localdomain <none> <none>
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 11m 192.168.0.182 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 1/1 Running 0 11m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 1/1 Running 0 11m 10.1.0.3 localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 11m 192.168.0.166 localhost.localdomain <none> <none>
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 11m 192.168.0.166 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 0/1 Error 2 11m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 1/1 Running 0 11m 10.1.0.3 localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 11m 192.168.0.249 localhost.localdomain <none> <none>
kube-system kube-scheduler-localhost.localdomain 0/1 Pending 0 0s <none> localhost.localdomain <none> <none>
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 11m 192.168.0.249 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 0/1 CrashLoopBackOff 2 11m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 1/1 Running 0 11m 10.1.0.3 localhost.localdomain <none> <none>
kube-system etcd-localhost.localdomain 0/1 Pending 0 1s <none> localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 11m 192.168.0.166 localhost.localdomain <none> <none>
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 11m 192.168.0.182 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 0/1 Error 3 11m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 0/1 Error 2 11m 10.1.0.3 localhost.localdomain <none> <none>
kube-system kube-apiserver-localhost.localdomain 0/1 Pending 0 0s <none> localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 11m 192.168.0.249 localhost.localdomain <none> <none>
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 11m 192.168.0.249 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 1/1 Running 0 11m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 0/1 CrashLoopBackOff 2 11m 10.1.0.3 localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 11m 192.168.0.166 localhost.localdomain <none> <none>
sysadm@master$ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-node-ntzn2 2/2 Running 0 11m 192.168.0.249 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-hqmn2 0/1 Running 3 11m 10.1.0.2 localhost.localdomain <none> <none>
kube-system coredns-fb8b8dccf-nfgr5 0/1 CrashLoopBackOff 2 11m 10.1.0.3 localhost.localdomain <none> <none>
kube-system kube-proxy-xgnlb 1/1 Running 0 11m 192.168.0.166 localhost.localdomain <none> <none>