Openshift 4.4-无法';oc日志\exec';在工作节点上运行的吊舱

Openshift 4.4-无法';oc日志\exec';在工作节点上运行的吊舱,openshift,kubelet,Openshift,Kubelet,Openshift 4.4.17集群(3个主节点和3个工作节点) 尝试查看工作节点上运行的那些POD上的日志(或exec终端)时出错。这同样适用于Openshift GUI。尝试对主节点上运行的POD执行相同操作时没有问题 示例1:在worker上运行的吊舱 $ oc whoami kube:admin $ oc get pod -n lamp NAME READY STATUS RESTARTS AGE lamp-lamp-6c7d

Openshift 4.4.17集群(3个主节点和3个工作节点)

尝试查看工作节点上运行的那些POD上的日志(或exec终端)时出错。这同样适用于Openshift GUI。尝试对主节点上运行的POD执行相同操作时没有问题

示例1:在worker上运行的吊舱

$ oc whoami
kube:admin
$ oc get pod -n lamp
NAME                         READY   STATUS    RESTARTS   AGE
lamp-lamp-6c7d9f467d-jsn4t   3/3     Running   0          108d

$ oc logs lamp-lamp-6c7d9f467d-jsn4t httpd -n lamp
error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log lamp-lamp-6c7d9f467d-jsn4t))
示例2:主节点上运行的POD

$ oc get pod -n openshift-apiserver
NAME                       READY   STATUS    RESTARTS   AGE
apiserver-6d64545f-5lmb8   1/1     Running   0          2d19h
apiserver-6d64545f-hktqd   1/1     Running   0          2d19h
apiserver-6d64545f-kb4qt   1/1     Running   0          2d19h

$ oc logs apiserver-6d64545f-5lmb8 -n openshift-apiserver
Copying system trust bundle
I0225 20:41:39.989689       1 requestheader_controller.go:243] (..output omitted..)
调查工作节点上的kubelet:

在每个工作节点上,kubelet服务都在运行,但是

journalctl -u kubelet 
显示以下两行:

Unable to authenticate the request due to an error: x509: certificate signed by unknown authority
logging error output: "Unauthorized"
关于工作节点上的kubeconfig:

查看/etc/kubernetes/kubeconfig文件的内容

- kubelet connects to api-server                --> https://api-int.ocs-cls1.mycompany.lab
- the server passes valid certificate signed by --> kube-apiserver-lb-signer
- certificate-authority-data carries            --> kube-apiserver-lb-signer rootCA
kubeconfig看起来是正确的

更新:

还注意到以下日志行报告错误的证书:

$ oc -n openshift-apiserver logs apiserver-6d64545f-5lmb8
log.go:172] http: TLS handshake error from 10.128.0.12:47078: remote error: tls: bad certificate
...
$ curl --resolve apiserver-loopback-client:6443:{IP_MASTER} -v -k https://apiserver-loopback-client:6443/healthz
server certificate verification SKIPPED
*        server certificate status verification SKIPPED
*        common name: apiserver-loopback-client@1614330374 (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=apiserver-loopback-client@1614330374
*        start date: Fri, 26 Feb 2021 08:06:13 GMT
*        expire date: Sat, 26 Feb 2022 08:06:13 GMT
*        issuer: CN=apiserver-loopback-client-ca@1614330374
更新2:

还检查了apiserver环回客户端证书:

$ oc -n openshift-apiserver logs apiserver-6d64545f-5lmb8
log.go:172] http: TLS handshake error from 10.128.0.12:47078: remote error: tls: bad certificate
...
$ curl --resolve apiserver-loopback-client:6443:{IP_MASTER} -v -k https://apiserver-loopback-client:6443/healthz
server certificate verification SKIPPED
*        server certificate status verification SKIPPED
*        common name: apiserver-loopback-client@1614330374 (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=apiserver-loopback-client@1614330374
*        start date: Fri, 26 Feb 2021 08:06:13 GMT
*        expire date: Sat, 26 Feb 2022 08:06:13 GMT
*        issuer: CN=apiserver-loopback-client-ca@1614330374
试试这个

while :;do
  sleep 2
  oc get csr -o name | xargs -r oc adm certificate approve
done
使用另一个终端,并将ssh连接到任何主节点,运行以下操作:

crictl ps-a | awk'/Running/&&/-cert syncer/{print$1}'| xargs-r crictl stop

听起来像是CSR问题,您是否批准了节点的所有证书?
oc get csr
说明了什么?$oc get csr-->在默认命名空间中找不到资源。