Kubernetes Metrics server无法从Windows Worker中提取指标
我有一个Kubernetes群集,有两个Windows工作节点。运行Kubernetes Metrics server无法从Windows Worker中提取指标,kubernetes,rancher,metrics-server,Kubernetes,Rancher,Metrics Server,我有一个Kubernetes群集,有两个Windows工作节点。运行kubectl top nodes时,Windows节点报告为未知。我做了一些调查,发现日志中有错误 来自度量服务器的日志 E0529 12:04:50.809303 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02:
kubectl top nodes
时,Windows节点报告为未知。我做了一些调查,发现日志中有错误
来自度量服务器的日志
E0529 12:04:50.809303 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 7d7489daf4b889d4b55d7889a617017768035b7c1c43e8cef5ac0210e7b2ac65: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]
E0529 12:05:50.838175 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 7d7489daf4b889d4b55d7889a617017768035b7c1c43e8cef5ac0210e7b2ac65: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]
E0529 12:06:50.815777 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 7d7489daf4b889d4b55d7889a617017768035b7c1c43e8cef5ac0210e7b2ac65: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]
E0529 12:07:50.800927 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 7d7489daf4b889d4b55d7889a617017768035b7c1c43e8cef5ac0210e7b2ac65: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]
E0529 12:08:50.821804 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 7d7489daf4b889d4b55d7889a617017768035b7c1c43e8cef5ac0210e7b2ac65: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]
E0529 12:09:12.819567 1 reststorage.go:135] unable to fetch node metrics for node "qa-k8sw-win-02": no metrics known for node
E0529 12:09:12.819592 1 reststorage.go:135] unable to fetch node metrics for node "qa-k8sw-win-01": no metrics known for node
E0529 12:09:50.809012 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 7d7489daf4b889d4b55d7889a617017768035b7c1c43e8cef5ac0210e7b2ac65: A virtual machine or container with the specified identifier does not exist."]
E0529 12:10:53.085842 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = container 29f403ebb265389ac1bcbe39f8a555045e1e461a2abf065f11d2f8b267f83b12 encountered an error during Properties: failure in a Windows system call: A system shutdown is in progress. (0x45b)"]
E0529 12:12:00.147458 1 reststorage.go:135] unable to fetch node metrics for node "qa-k8sw-win-02": no metrics known for node
E0529 12:12:00.147485 1 reststorage.go:135] unable to fetch node metrics for node "qa-k8sw-win-01": no metrics known for node
E0529 12:12:44.741135 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): Get https://10.4.111.68:10250/stats/summary?only_cpu_and_memory=true: context deadline exceeded]
E0529 12:13:44.740851 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): Get https://10.4.111.68:10250/stats/summary?only_cpu_and_memory=true: context deadline exceeded]
E0529 12:14:44.740965 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): Get https://10.4.111.68:10250/stats/summary?only_cpu_and_memory=true: context deadline exceeded]
E0529 12:15:44.740936 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): Get https://10.4.111.68:10250/stats/summary?only_cpu_and_memory=true: context deadline exceeded]
我在尝试从运行metrics server pod(qa-k8sm-02
I运行curl-v-k)的节点获取度量值时看到一个错误https://10.4.111.68:10250/stats/summary?only_cpu_and_memory=true
curl -v -k https://10.4.111.68:10250/stats/summary?only_cpu_and_memory=true
* About to connect() to 10.4.111.68 port 10250 (#0)
* Trying 10.4.111.68...
* Connected to 10.4.111.68 (10.4.111.68) port 10250 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=qa-k8sw-win-02@1589888891
* start date: May 19 10:48:10 2020 GMT
* expire date: May 19 10:48:10 2021 GMT
* common name: qa-k8sw-win-02@1589888891
* issuer: CN=qa-k8sw-win-02-ca@1589888890
> GET /stats/summary?only_cpu_and_memory=true HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.4.111.68:10250
> Accept: */*
>
< HTTP/1.1 401 Unauthorized
< Date: Fri, 29 May 2020 12:30:40 GMT
< Content-Length: 12
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 10.4.111.68 left intact
所以我没有看到任何错误,所以我登录到Windows服务器并开始查看容器日志,当我查看kubelet的日志时,我看到了一堆错误
E0529 18:01:50.824959 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 29f403ebb265389ac1bcbe39f8a555045e1e461a2abf065f11d2f8b267f83b12: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]
E0529 18:02:50.797328 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 29f403ebb265389ac1bcbe39f8a555045e1e461a2abf065f11d2f8b267f83b12: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]
这与windows无关。请阅读github问题。其中有很多问题解释了如何解决此特定场景。@suren您有任何链接吗?有几个链接。例如,这一个。可能看起来不一样,但您需要配置tls连接。@suren谢谢,但仅报告了linux节点的指标很好,只有Windows工作人员没有报告或显示错误。真的吗?好的。很抱歉让您感到困惑。
E0529 18:01:50.824959 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 29f403ebb265389ac1bcbe39f8a555045e1e461a2abf065f11d2f8b267f83b12: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]
E0529 18:02:50.797328 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-02: unable to fetch metrics from Kubelet qa-k8sw-win-02 (10.4.111.68): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem 29f403ebb265389ac1bcbe39f8a555045e1e461a2abf065f11d2f8b267f83b12: A virtual machine or container with the specified identifier does not exist.", unable to fully scrape metrics from source kubelet_summary:qa-k8sw-win-01: unable to fetch metrics from Kubelet qa-k8sw-win-01 (10.4.111.189): request failed - "500 Internal Server Error", response: "Internal Error: failed to list pod stats: failed to list all container stats: rpc error: code = Unknown desc = hcsshim::OpenComputeSystem cf9d0d43c099624bbbb13cb4a607ca3a818a66aad1631897d47e1c66827782ac: A virtual machine or container with the specified identifier does not exist."]