kubernetes HA群集主节点未就绪
我已使用下一个config.yaml部署了kubernetes HA群集:kubernetes HA群集主节点未就绪,kubernetes,etcd,project-calico,Kubernetes,Etcd,Project Calico,我已使用下一个config.yaml部署了kubernetes HA群集: etcd: endpoints: - "http://172.16.8.236:2379" - "http://172.16.8.237:2379" - "http://172.16.8.238:2379" networking: podSubnet: "192.168.0.0/16" apiServerExtraArgs: endpoint-reconciler-type: lease 当我选
etcd:
endpoints:
- "http://172.16.8.236:2379"
- "http://172.16.8.237:2379"
- "http://172.16.8.238:2379"
networking:
podSubnet: "192.168.0.0/16"
apiServerExtraArgs:
endpoint-reconciler-type: lease
当我选中kubectl get nodes
时:
NAME STATUS ROLES AGE VERSION
master1 Ready master 22m v1.10.4
master2 NotReady master 17m v1.10.4
master3 NotReady master 16m v1.10.4
如果我检查吊舱,我可以看到太多的吊舱出现故障:
[ikerlan@master1 ~]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-etcd-5jftb 0/1 NodeLost 0 16m
calico-etcd-kl7hb 1/1 Running 0 16m
calico-etcd-z7sps 0/1 NodeLost 0 16m
calico-kube-controllers-79dccdc4cc-vt5t7 1/1 Running 0 16m
calico-node-dbjl2 2/2 Running 0 16m
calico-node-gkkth 0/2 NodeLost 0 16m
calico-node-rqzzl 0/2 NodeLost 0 16m
kube-apiserver-master1 1/1 Running 0 21m
kube-controller-manager-master1 1/1 Running 0 22m
kube-dns-86f4d74b45-rwchm 1/3 CrashLoopBackOff 17 22m
kube-proxy-226xd 1/1 Running 0 22m
kube-proxy-jr2jq 0/1 ContainerCreating 0 18m
kube-proxy-zmjdm 0/1 ContainerCreating 0 17m
kube-scheduler-master1 1/1 Running 0 21m
如果我运行kubectl description node master2
:
[ikerlan@master1 ~]$ kubectl describe node master2
Name: master2
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=master2
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Mon, 11 Jun 2018 12:06:03 +0200
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure False Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:00 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 172.16.8.237
Hostname: master2
Capacity:
cpu: 2
ephemeral-storage: 37300436Ki
然后,如果我检查吊舱,kubectl描述吊舱-n kube系统calico-etcd-5jftb
:
[ikerlan@master1 ~]$ kubectl describe pod -n kube-system calico-etcd-5jftb
Name: calico-etcd-5jftb
Namespace: kube-system
Node: master2/
Labels: controller-revision-hash=4283683065
k8s-app=calico-etcd
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Terminating (lasts 20h)
Termination Grace Period: 30s
Reason: NodeLost
Message: Node master2 which was running pod calico-etcd-5jftb is unresponsive
IP:
Controlled By: DaemonSet/calico-etcd
Containers:
calico-etcd:
Image: quay.io/coreos/etcd:v3.1.10
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/etcd
Args:
--name=calico
--data-dir=/var/etcd/calico-data
--advertise-client-urls=http://$CALICO_ETCD_IP:6666
--listen-client-urls=http://0.0.0.0:6666
--listen-peer-urls=http://0.0.0.0:6667
--auto-compaction-retention=1
Environment:
CALICO_ETCD_IP: (v1:status.podIP)
Mounts:
/var/etcd from var-etcd (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tj6d7 (ro)
Volumes:
var-etcd:
Type: HostPath (bare host directory volume)
Path: /var/etcd
HostPathType:
default-token-tj6d7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tj6d7
Optional: false
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events: <none>
我看到多个calico etcd播客正在尝试运行,如果您使用了为您部署etcd的calico.yaml,则在多主机环境中无法运行 该清单不适用于生产部署,也不会在多主机环境中工作,因为它部署的etcd未配置为尝试形成集群 您仍然可以使用该清单,但需要删除它部署的etcd吊舱,并将etcd_端点设置为您已部署的etcd群集。我已经解决了这个问题:
由于网络/cni问题,它们很可能没有准备好。你能试一下kubectl描述节点master2吗?同样
kubectl描述pod-n kube系统
和待处理的pod。@JanosLenart你能再次检查这个问题吗,包括你问我的输出。谢谢,我已经部署了自己的etcd集群,看起来还可以。但是当我检查节点s时,我只有一个就绪主机,另外两个处于NotReady状态。然后,当我部署calico网络时,我会检查一些服务是否出现故障。
2018-06-12 09:17:51.305960 W | etcdserver: read-only range request "key:\"/registry/apiregistration.k8s.io/apiservices/v1beta1.authentication.k8s.io\" " took too long (190.475363ms) to execute
2018-06-12 09:18:06.788558 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " took too long (109.543763ms) to execute
2018-06-12 09:18:34.875823 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " took too long (136.649505ms) to execute
2018-06-12 09:18:41.634057 W | etcdserver: read-only range request "key:\"/registry/minions\" range_end:\"/registry/miniont\" count_only:true " took too long (106.00073ms) to execute
2018-06-12 09:18:42.345564 W | etcdserver: request "header:<ID:4449666326481959890 > lease_revoke:<ID:4449666326481959752 > " took too long (142.771179ms) to execute
22m 22m 1 master2.15375fdf087fc69f Node Normal Starting kube-proxy, master2 Starting kube-proxy.
22m 22m 1 master3.15375fe744055758 Node Normal Starting kubelet, master3 Starting kubelet.
22m 22m 5 master3.15375fe74d47afa2 Node Normal NodeHasSufficientDisk kubelet, master3 Node master3 status is now: NodeHasSufficientDisk
22m 22m 5 master3.15375fe74d47f80f Node Normal NodeHasSufficientMemory kubelet, master3 Node master3 status is now: NodeHasSufficientMemory
22m 22m 5 master3.15375fe74d48066e Node Normal NodeHasNoDiskPressure kubelet, master3 Node master3 status is now: NodeHasNoDiskPressure
22m 22m 5 master3.15375fe74d481368 Node Normal NodeHasSufficientPID kubelet, master3 Node master3 status is now: NodeHasSufficientPID