Kubernetes kube计划程序活动性探测失败:获取http://127.0.0.1:10251/healthz: 拨打tcp 127.0.0.1:10251:连接:连接被拒绝

Kubernetes kube计划程序活动性探测失败:获取http://127.0.0.1:10251/healthz: 拨打tcp 127.0.0.1:10251:连接:连接被拒绝,kubernetes,kubeadm,Kubernetes,Kubeadm,所以我让这个不健康的集群在数据中心部分工作。这可能是我第十次根据以下说明进行重建: 我可以将一些吊舱应用到这个集群上,它似乎可以工作,但最终它开始减速并崩溃,如下面所示。以下是计划程序清单: apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: component: kube-scheduler tier: control-plane name: kube-scheduler name

所以我让这个不健康的集群在数据中心部分工作。这可能是我第十次根据以下说明进行重建:

我可以将一些吊舱应用到这个集群上,它似乎可以工作,但最终它开始减速并崩溃,如下面所示。以下是计划程序清单:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    - --bind-address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=true
    image: k8s.gcr.io/kube-scheduler:v1.14.2
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10251
        scheme: HTTP
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: kube-scheduler
    resources:
      requests:
        cpu: 100m
    volumeMounts:
    - mountPath: /etc/kubernetes/scheduler.conf
      name: kubeconfig
      readOnly: true
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /etc/kubernetes/scheduler.conf
      type: FileOrCreate
    name: kubeconfig
status: {}
$kubectl-n kube系统获得吊舱

NAME                                       READY   STATUS             RESTARTS   AGE
coredns-fb8b8dccf-42psn                    1/1     Running            9          88m
coredns-fb8b8dccf-x9mlt                    1/1     Running            11         88m
docker-registry-dqvzb                      1/1     Running            1          2d6h
kube-apiserver-kube-apiserver-1            1/1     Running            44         2d8h
kube-apiserver-kube-apiserver-2            1/1     Running            34         2d7h
kube-controller-manager-kube-apiserver-1   1/1     Running            198        2d2h
kube-controller-manager-kube-apiserver-2   0/1     CrashLoopBackOff   170        2d7h
kube-flannel-ds-amd64-4mbfk                1/1     Running            1          2d7h
kube-flannel-ds-amd64-55hc7                1/1     Running            1          2d8h
kube-flannel-ds-amd64-fvwmf                1/1     Running            1          2d7h
kube-flannel-ds-amd64-ht5wm                1/1     Running            3          2d7h
kube-flannel-ds-amd64-rjt9l                1/1     Running            4          2d8h
kube-flannel-ds-amd64-wpmkj                1/1     Running            1          2d7h
kube-proxy-2n64d                           1/1     Running            3          2d7h
kube-proxy-2pq2g                           1/1     Running            1          2d7h
kube-proxy-5fbms                           1/1     Running            2          2d8h
kube-proxy-g8gmn                           1/1     Running            1          2d7h
kube-proxy-wrdrj                           1/1     Running            1          2d8h
kube-proxy-wz6gv                           1/1     Running            1          2d7h
kube-scheduler-kube-apiserver-1            0/1     CrashLoopBackOff   198        2d2h
kube-scheduler-kube-apiserver-2            1/1     Running            5          18m
nginx-ingress-controller-dz8fm             1/1     Running            3          2d4h
nginx-ingress-controller-sdsgg             1/1     Running            3          2d4h
nginx-ingress-controller-sfrgb             1/1     Running            1          2d4h
$kubectl-n kube系统描述pod kube-scheduler-kube-apiserver-1

Containers:
  kube-scheduler:
    Container ID:  docker://c04f3c9061cafef8749b2018cd66e6865d102f67c4d13bdd250d0b4656d5f220
    Image:         k8s.gcr.io/kube-scheduler:v1.14.2
    Image ID:      docker-pullable://k8s.gcr.io/kube-scheduler@sha256:052e0322b8a2b22819ab0385089f202555c4099493d1bd33205a34753494d2c2
    Port:          <none>
    Host Port:     <none>
    Command:
      kube-scheduler
      --bind-address=127.0.0.1
      --kubeconfig=/etc/kubernetes/scheduler.conf
      --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
      --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
      --leader-elect=true
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 28 May 2019 23:16:50 -0400
      Finished:     Tue, 28 May 2019 23:19:56 -0400
    Ready:          False
    Restart Count:  195
    Requests:
      cpu:        100m
    Liveness:     http-get http://127.0.0.1:10251/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
    Environment:  <none>
    Mounts:
      /etc/kubernetes/scheduler.conf from kubeconfig (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kubeconfig:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/scheduler.conf
    HostPathType:  FileOrCreate
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       :NoExecute
Events:
  Type     Reason          Age                    From                       Message
  ----     ------          ----                   ----                       -------
  Normal   Created         4h56m (x104 over 37h)  kubelet, kube-apiserver-1  Created container kube-scheduler
  Normal   Started         4h56m (x104 over 37h)  kubelet, kube-apiserver-1  Started container kube-scheduler
  Warning  Unhealthy       137m (x71 over 34h)    kubelet, kube-apiserver-1  Liveness probe failed: Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
  Normal   Pulled          132m (x129 over 37h)   kubelet, kube-apiserver-1  Container image "k8s.gcr.io/kube-scheduler:v1.14.2" already present on machine
  Warning  BackOff         128m (x1129 over 34h)  kubelet, kube-apiserver-1  Back-off restarting failed container
  Normal   SandboxChanged  80m                    kubelet, kube-apiserver-1  Pod sandbox changed, it will be killed and re-created.
  Warning  Failed          76m                    kubelet, kube-apiserver-1  Error: context deadline exceeded
  Normal   Pulled          36m (x7 over 78m)      kubelet, kube-apiserver-1  Container image "k8s.gcr.io/kube-scheduler:v1.14.2" already present on machine
  Normal   Started         36m (x6 over 74m)      kubelet, kube-apiserver-1  Started container kube-scheduler
  Normal   Created         32m (x7 over 74m)      kubelet, kube-apiserver-1  Created container kube-scheduler
  Warning  Unhealthy       20m (x9 over 40m)      kubelet, kube-apiserver-1  Liveness probe failed: Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
  Warning  BackOff         2m56s (x85 over 69m)   kubelet, kube-apiserver-1  Back-off restarting failed container
容器:
kube调度程序:
容器ID:docker://c04f3c9061cafef8749b2018cd66e6865d102f67c4d13bdd250d0b4656d5f220
图片:k8s.gcr.io/kube调度程序:v1.14.2
图像ID:docker-pullable://k8s.gcr.io/kube-scheduler@sha256:052E0322B8A2B22819AB0385089F20255C4099493D1BD33205A34753494D2C2
端口:
主机端口:
命令:
kube调度程序
--绑定地址=127.0.0.1
--kubeconfig=/etc/kubernetes/scheduler.conf
--身份验证kubeconfig=/etc/kubernetes/scheduler.conf
--授权kubeconfig=/etc/kubernetes/scheduler.conf
--当选领导人=正确
国家:等待
原因:仓促退却
最后状态:终止
原因:错误
退出代码:1
开始时间:2019年5月28日星期二23:16:50-0400
完成时间:2019年5月28日星期二23:19:56-0400
就绪:错误
重新启动计数:195
请求:
中央处理器:100米
活跃度:http gethttp://127.0.0.1:10251/healthz 延迟=15s超时=15s周期=10s成功=1失败=8
环境:
挂载:
/来自kubeconfig(ro)的etc/kubernetes/scheduler.conf
条件:
类型状态
初始化为True
准备错误
集装箱准备好了吗
播客预定为真
卷数:
库贝科菲格:
类型:主机路径(裸主机目录卷)
路径:/etc/kubernetes/scheduler.conf
主机路径类型:文件或创建
QoS等级:Burstable
节点选择器:
容忍:不执行
活动:
从消息中键入原因年龄
----     ------          ----                   ----                       -------
正常创建4h56m(37小时x104)kubelet,kube-apiserver-1创建容器kube调度器
正常启动4h56m(37小时x104)kubelet,kube-apiserver-1启动容器kube调度器
警告137m(x71超过34小时)kubelet,kube-apiserver-1活动性探测失败:获取http://127.0.0.1:10251/healthz: 拨打tcp 127.0.0.1:10251:连接:连接被拒绝
正常拉动132m(x129超过37小时)kubelet,kube-apiserver-1容器映像“k8s.gcr.io/kube调度程序:v1.14.2”已存在于计算机上
警告后退128m(x1129超过34小时)kubelet,kube-apiserver-1后退重新启动失败的容器
普通沙箱更改80m kubelet,kube-apiserver-1 Pod沙箱更改,它将被杀死并重新创建。
警告失败76m kubelet,kube-apiserver-1错误:超过上下文截止日期
机器上已存在正常拉取的36m(x7/78m)kubelet、kube-apiserver-1容器映像“k8s.gcr.io/kube调度程序:v1.14.2”
正常启动36m(x6/74m)kubelet,kube-apiserver-1启动容器kube调度器
正常创建32m(x7/74m)kubelet,kube-apiserver-1创建容器kube调度器
警告:20m(x9超过40m)kubelet,kube-apiserver-1活动性探测失败:获取http://127.0.0.1:10251/healthz: 拨打tcp 127.0.0.1:10251:连接:连接被拒绝
警告后退2m56s(x85超过69m)kubelet,kube-apiserver-1后退重新启动失败的容器
我觉得我忽略了一个简单的选项或配置,但我找不到它,在处理这个问题和阅读文档几天后,我不知所措

负载平衡器是一个TCP负载平衡器,它似乎按照预期工作,因为我可以从桌面查询集群

此时,欢迎您提供任何建议或故障排除提示


谢谢。

我们配置的问题是,一位好心的技术人员决定取消kubernetes主防火墙上的一条规则,该规则阻止主防火墙循环回它需要探测的端口。这导致了各种奇怪的问题和误诊问题,这肯定是错误的方向。在我们允许服务器上的所有端口后,Kubernetes恢复了正常行为。

kubectl日志kube-scheduler-kube-apiserver-1-n kube系统的输出是什么??由于我不能在这里发表长篇大论,我创建了以下要点:为什么有多个控制器管理器和scheduler吊舱?kube-scheduler-kube-apiserver-2是否存在相同的问题,还是仅与apiserver-1相关?我有一个api群集,目前只有两个成员,但一旦解决此问题,我将添加第三个成员。我们想在HA模式下运行kubernetes。这其实并不重要,因为即使我只有一台主服务器,这个问题也会发生。见: