Docker 库伯内特斯印花布节点';xxxxxxxxxx';已在使用IPv4地址XXXXXXXXX,CrashLoopBackOff

Docker 库伯内特斯印花布节点';xxxxxxxxxx';已在使用IPv4地址XXXXXXXXX,CrashLoopBackOff,docker,kubernetes,project-calico,Docker,Kubernetes,Project Calico,我使用AWS Kubernetes Quickstart在VPC和专用子网中创建Kubernetes群集:。有一段时间,天气很好。我在Kubernetes集群上安装了印花布。我有两个节点和一个主节点。主机上的印花布吊舱运行正常,节点上的印花布吊舱处于crashloopbackoff状态: NAME READY STATUS RESTARTS

我使用AWS Kubernetes Quickstart在VPC和专用子网中创建Kubernetes群集:。有一段时间,天气很好。我在Kubernetes集群上安装了印花布。我有两个节点和一个主节点。主机上的印花布吊舱运行正常,节点上的印花布吊舱处于crashloopbackoff状态:

NAME                                                               READY     STATUS             RESTARTS   AGE
calico-etcd-ztwjj                                                  1/1       Running            1          55d
calico-kube-controllers-685755779f-ftm92                           1/1       Running            2          55d
calico-node-gkjgl                                                  1/2       CrashLoopBackOff   270        22h
calico-node-jxkvx                                                  2/2       Running            4          55d
calico-node-mxhc5                                                  1/2       CrashLoopBackOff   9          25m
描述其中一个坠毁的吊舱:

ubuntu@ip-10-0-1-133:~$ kubectl describe pod calico-node-gkjgl -n kube-system
Name:           calico-node-gkjgl
Namespace:      kube-system
Node:           ip-10-0-0-237.us-east-2.compute.internal/10.0.0.237
Start Time:     Mon, 17 Sep 2018 16:56:41 +0000
Labels:         controller-revision-hash=185957727
                k8s-app=calico-node
                pod-template-generation=1
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Running
IP:             10.0.0.237
Controlled By:  DaemonSet/calico-node
Containers:
  calico-node:
    Container ID:   docker://d89979ba963c33470139fd2093a5427b13c6d44f4c6bb546c9acdb1a63cd4f28
    Image:          quay.io/calico/node:v3.1.1
    Image ID:       docker-pullable://quay.io/calico/node@sha256:19fdccdd4a90c4eb0301b280b50389a56e737e2349828d06c7ab397311638d29
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 18 Sep 2018 15:14:44 +0000
      Finished:     Tue, 18 Sep 2018 15:14:44 +0000
    Ready:          False
    Restart Count:  270
    Requests:
      cpu:      250m
    Liveness:   http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ETCD_ENDPOINTS:                     <set to the key 'etcd_endpoints' of config map 'calico-config'>  Optional: false
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       kubeadm,bgp
      CALICO_DISABLE_FILE_LOGGING:        true
      CALICO_K8S_NODE_REF:                 (v1:spec.nodeName)
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      CALICO_IPV4POOL_CIDR:               192.168.0.0/16
      CALICO_IPV4POOL_IPIP:               Always
      FELIX_IPV6SUPPORT:                  false
      FELIX_IPINIPMTU:                    1440
      FELIX_LOGSEVERITYSCREEN:            info
      IP:                                 autodetect
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /var/lib/calico from var-lib-calico (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
  install-cni:
    Container ID:  docker://b37e0ec7eba690473a4999a31d9f766f7adfa65f800a7b2dc8e23ead7520252d
    Image:         quay.io/calico/cni:v3.1.1
    Image ID:      docker-pullable://quay.io/calico/cni@sha256:dc345458d136ad9b4d01864705895e26692d2356de5c96197abff0030bf033eb
    Port:          <none>
    Host Port:     <none>
    Command:
      /install-cni.sh
    State:          Running
      Started:      Mon, 17 Sep 2018 17:11:52 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 17 Sep 2018 16:56:43 +0000
      Finished:     Mon, 17 Sep 2018 17:10:53 +0000
    Ready:          True
    Restart Count:  1
    Environment:
      CNI_CONF_NAME:       10-calico.conflist
      ETCD_ENDPOINTS:      <set to the key 'etcd_endpoints' of config map 'calico-config'>      Optional: false
      CNI_NETWORK_CONFIG:  <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  calico-cni-plugin-token-b7sfl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-cni-plugin-token-b7sfl
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     :NoSchedule
                 :NoExecute
                 :NoSchedule
                 :NoExecute
                 CriticalAddonsOnly
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason   Age                  From                                               Message
  ----     ------   ----                 ----                                               -------
  Warning  BackOff  4m (x6072 over 22h)  kubelet, ip-10-0-0-237.us-east-2.compute.internal  Back-off restarting failed container
因此,查找节点IP地址时似乎存在冲突,或者Calico似乎认为该IP已分配给另一个节点。在快速搜索时,我找到了以下线程:。我认为应该通过将IP_AUTODETECTION_方法设置为can reach=DESTINATION来解决这个问题,我假设它是“can reach=10.0.0.237”。此配置是calico/node容器上设置的环境变量。我一直在尝试将外壳装入容器本身,但kubectl告诉我,未找到容器:

ubuntu@ip-10-0-1-133:~$ kubectl exec calico-node-gkjgl --stdin --tty /bin/sh -c calico-node -n kube-system
error: unable to upgrade connection: container not found ("calico-node")
我怀疑这是因为印花布无法分配IP。因此,我登录到主机并尝试使用docker在容器上shell:

root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/bash
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"
所以我猜容器中没有要执行的shell。这就是为什么库伯内特斯不能执行的原因。我尝试在外部运行命令以列出环境变量,但没有找到任何环境变量,但可能是错误地运行了这些命令:

root@ip-10-0-0-237:~# docker inspect -f '{{range $index, $value := .Config.Env}}{{$value}} {{end}}' k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 printenv IP_AUTODETECTION_METHOD
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"printenv\": executable file not found in $PATH"

root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/env
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/env\": stat /bin/env: no such file or directory"
好吧,也许我走错了方向。我是否应该尝试使用Kubernetes更改印花布配置文件并重新部署它?在我的系统上哪里可以找到这些?我无法找到在哪里设置环境变量

如果您查看
IP\u自动检测方法
已默认为
第一轮

我的猜测是,calico的上一次“运行”并没有发布某些内容或IP地址,或者只是calico的
v3.1.1
版本中的一个bug

尝试:

  • 删除处于崩溃循环中的印花布吊舱

    kubectl -n kube-system delete calico-node-gkjgl calico-node-mxhc5
    
    你的吊舱将被重新创建,并有望初始化

  • 将Calico升级至
    v3.1.3
    或最新版本。根据这些,我猜Heptio的印花布安装使用的是etcd数据存储

  • 试着了解Heptio的AWS AMI是如何工作的,看看他们是否有任何问题。这可能需要一些时间,因此您也可以联系他们的支持人员

  • 尝试另一种方法,用印花布安装Kubernetes。关于


  • 取决于所使用的linux发行版。它能把贝壳震碎。try/bin/shOn再次读取,猜测您正在尝试登录到尚未运行的容器。检查。查看是否生成了配置文件,尝试在那里更新,然后重新启动。我还查看了calico文档,有一种方法可以设置interfaces to interface=使用在接口上找到的第一个有效IP地址,该地址根据提供的第一个匹配接口名regex命名。regex之间用逗号分隔(例如eth.*,enp0s.*)@Narain您认为容器没有运行是正确的,您发布的评论非常有助于参考。不幸的是,我在/var/lib/docker/containers/[container id]/config.json中找不到容器的IP检测方法,但通过升级到v3.2.1,我能够让Calico再次运行。我认为您是正确的,这是由于一个错误。我曾试图删除这些豆荚,以便它们重新启动,但不久它们就崩溃了。最后,我使用kubectl delete deployment-n kubesystem简单地删除了旧的部署,并按照步骤2中列出的说明升级到v3.2.1。我创建了新的RBAC,并使用kubectl apply-f安装了新的部署。几分钟后,印花布吊舱成功启动,集群中的其他容器在运行/重新启动时出现问题!真是太棒了!
    kubectl -n kube-system delete calico-node-gkjgl calico-node-mxhc5