Kubernetes K3s群集未启动(不再)

Kubernetes K3s群集未启动(不再),kubernetes,k3s,Kubernetes,K3s,我当地的k3s游乐场决定突然停止工作。我有直觉认为https证书有问题 我使用docker compose启动集群 version: '3.2' services: server: image: rancher/k3s:latest command: server --disable-agent --tls-san 192.168.2.110 environment: - K3S_CLUSTER_SECRET=somethingtotallyrandom

我当地的k3s游乐场决定突然停止工作。我有直觉认为https证书有问题 我使用docker compose启动集群

version: '3.2'

services:
  server:
    image: rancher/k3s:latest
    command: server  --disable-agent --tls-san 192.168.2.110
    environment:
    - K3S_CLUSTER_SECRET=somethingtotallyrandom
    - K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml
    - K3S_KUBECONFIG_MODE=666
    volumes:
    - k3s-server:/var/lib/rancher/k3s
    # get the kubeconfig file
    - .:/output
    - ./registries.yaml:/etc/rancher/k3s/registries.yaml
    ports:
     - 192.168.2.110:6443:6443

  node:
    image: rancher/k3s:latest
    volumes:
    - ./registries.yaml:/etc/rancher/k3s/registries.yaml

    tmpfs:
    - /run
    - /var/run
    privileged: true
    environment:
    - K3S_URL=https://server:6443
    - K3S_CLUSTER_SECRET=somethingtotallyrandom
    ports:
      - 31000-32000:31000-32000

volumes:
  k3s-server: {}
没什么特别的。
registries.yaml
可以取消注释而不产生任何影响。目录 是

然而,我现在遇到了一堆奇怪的失败

server_1  | E0516 22:58:03.264451       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
server_1  | E0516 22:58:08.265272       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
node_1    | I0516 22:58:12.695365       1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: bb7ee4b14724692f4497e99716b68c4dc4fe77333b03801909092d42c00ef5a2
node_1    | I0516 22:58:15.006306       1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: bb7ee4b14724692f4497e99716b68c4dc4fe77333b03801909092d42c00ef5a2
node_1    | I0516 22:58:15.006537       1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: fc2e51300f2ec06949abf5242690cb36077adc409f0d7f131a9d4f911063b63c
node_1    | E0516 22:58:15.006757       1 pod_workers.go:191] Error syncing pod e127dc88-e252-4e2e-bbd5-2e93ce5e32ff ("helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"), skipping: failed to "StartContainer" for "helm" with CrashLoopBackOff: "back-off 1m20s restarting failed container=helm pod=helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"
server_1  | E0516 22:58:22.345501       1 resource_quota_controller.go:408] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
node_1    | I0516 22:58:27.695296       1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: fc2e51300f2ec06949abf5242690cb36077adc409f0d7f131a9d4f911063b63c
node_1    | E0516 22:58:27.695989       1 pod_workers.go:191] Error syncing pod e127dc88-e252-4e2e-bbd5-2e93ce5e32ff ("helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"), skipping: failed to "StartContainer" for "helm" with CrashLoopBackOff: "back-off 1m20s restarting failed container=helm pod=helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"
server_1  | I0516 22:58:30.328999       1 request.go:621] Throttling request took 1.047650754s, request: GET:https://127.0.0.1:6444/apis/admissionregistration.k8s.io/v1beta1?timeout=32s
server_1  | W0516 22:58:31.081020       1 garbagecollector.go:644] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
server_1  | E0516 22:58:36.442904       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
node_1    | I0516 22:58:40.695404       1 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: fc2e51300f2ec06949abf5242690cb36077adc409f0d7f131a9d4f911063b63c
node_1    | E0516 22:58:40.696176       1 pod_workers.go:191] Error syncing pod e127dc88-e252-4e2e-bbd5-2e93ce5e32ff ("helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"), skipping: failed to "StartContainer" for "helm" with CrashLoopBackOff: "back-off 1m20s restarting failed container=helm pod=helm-install-traefik-jfrjk_kube-system(e127dc88-e252-4e2e-bbd5-2e93ce5e32ff)"
server_1  | E0516 22:58:41.443295       1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: Get https://10.43.6.218:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)


此外,我的节点似乎不再真正连接到服务器

user@ipc:~/dev/test_mk3s_docker$ docker exec -it  $(docker ps |grep "k3s server"|awk -F\  '{print $1}') kubectl cluster-info
Kubernetes master is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
user@ipc:~/dev/test_mk3s_docker$ docker exec -it  $(docker ps |grep "k3s agent"|awk -F\  '{print $1}') kubectl cluster-info
error: Missing or incomplete configuration info.  Please point to an existing, complete config file:

  1. Via the command-line flag --kubeconfig
  2. Via the KUBECONFIG environment variable
  3. In your home directory as ~/.kube/config

To view or setup config directly use the 'config' command.
如果我运行“kubectl get apiservice”,我会得到以下命令

v1beta1.storage.k8s.io                 Local                        True                           20m
v1beta1.scheduling.k8s.io              Local                        True                           20m
v1.storage.k8s.io                      Local                        True                           20m
v1.k3s.cattle.io                       Local                        True                           20m
v1.helm.cattle.io                      Local                        True                           20m
v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (FailedDiscoveryCheck)   20m
此外,将k3s降级为
k3s:v1.0.1
只会更改错误消息

server_1  | E0516 23:46:02.951073       1 reflector.go:123] k8s.io/client-go/informers/factory.go:134: Failed to list *v1beta1.CSINode: no kind "CSINode" is registered for version "storage.k8s.io/v1" in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30"
server_1  | E0516 23:46:03.444519       1 status.go:71] apiserver received an error that is not an metav1.Status: &runtime.notRegisteredErr{schemeName:"k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30", gvk:schema.GroupVersionKind{Group:"storage.k8s.io", Version:"v1", Kind:"CSINode"}, target:runtime.GroupVersioner(nil), t:reflect.Type(nil)}
执行后

 docker exec -it  $(docker ps |grep "k3s server"|awk -F\  '{print $1}') kubectl --namespace kube-system delete apiservice v1beta1.metrics.k8s.io
我只有

node_1    | W0517 07:03:06.346944       1 info.go:51] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
node_1    | I0517 07:03:21.504932       1 log.go:172] http: TLS handshake error from 10.42.1.15:53888: remote error: tls: bad certificate

您在网络中引入了代理吗?没有代理或其他。只有这两个节点在同一台机器上。如果我删除了相应的apiservice,那么客户端就会出现TLS错误。您在网络中引入了代理吗?没有代理或其他任何东西。只有这两个节点在同一台机器上。如果我删除了相应的apiservice,那么客户端就会出现TLS错误
node_1    | W0517 07:03:06.346944       1 info.go:51] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
node_1    | I0517 07:03:21.504932       1 log.go:172] http: TLS handshake error from 10.42.1.15:53888: remote error: tls: bad certificate