Kubernetes 从自签名转换为商业证书TLS错误
当我安装集群时,我使用了来自内部CA机构的自签名证书。在我开始从部署到OKD集群的应用程序中获取证书错误之前,一切都很好。我们决定不再试图每次修复一个错误,而是购买一个商业证书并安装它。因此,我们从GlobalSign购买了一个带有通配符(与我们最初从内部CA获得的通配符相同)的SAN cert,我正试图安装它,但遇到了巨大的问题 请记住,我在这里尝试了几十次迭代。我只是记录了我最后一次尝试,试图找出到底是什么问题。这是在我的测试集群上,它是一个VM服务器,每次测试之后我都会恢复到快照。快照是使用内部CA证书的操作群集 所以,我的第一步是构建要传入的CAfile。我下载了GlobalSign的根证书和中间证书,并将它们放入Kubernetes 从自签名转换为商业证书TLS错误,kubernetes,openshift,openshift-origin,Kubernetes,Openshift,Openshift Origin,当我安装集群时,我使用了来自内部CA机构的自签名证书。在我开始从部署到OKD集群的应用程序中获取证书错误之前,一切都很好。我们决定不再试图每次修复一个错误,而是购买一个商业证书并安装它。因此,我们从GlobalSign购买了一个带有通配符(与我们最初从内部CA获得的通配符相同)的SAN cert,我正试图安装它,但遇到了巨大的问题 请记住,我在这里尝试了几十次迭代。我只是记录了我最后一次尝试,试图找出到底是什么问题。这是在我的测试集群上,它是一个VM服务器,每次测试之后我都会恢复到快照。快照是使
ca GlobalSign.crt
文件中。(PEM格式)
当我跑的时候
openssl verify -CAfile ../ca-globalsign.crt labtest.mycompany.com.pem
我得到:
labtest.mycompany.com.pem: OK
并且opensslx509-in labtest.mycompany.com.pem-text-noout
给了我(修订版)
在我的本地机器上。我所知道的关于SSL的一切都表明证书是好的。这些新文件放在我用来保存OKD安装的配置等的项目中
然后,我更新了ansible清单项目中的cert文件并运行命令
ansible-playbook -i ../okd_install/inventory/okd_labtest_inventory.yml playbooks/redeploy-certificates.yml
当我阅读文档时,所有的东西都告诉我,它应该简单地通过它的过程并拿出新的证书。这不会发生。当我在清单文件中使用openshift\u master\u overwrite\u named\u certificates:false
时,安装完成,但它只替换*.apps.labtest
域上的证书,但控制台.labtest
保持原始状态,但它确实联机,除了群集控制台中的监控显示坏网关
之外
现在,如果我再次尝试运行该命令,使用openshift\u master\u overwrite\u named\u certificates:true
my/var/log/containers/master api*。log
充满了这样的错误
{"log":"I0507 15:53:28.451851 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46796: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.451894391Z"}
{"log":"I0507 15:53:28.455218 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46798: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.455272658Z"}
{"log":"I0507 15:53:28.458742 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46800: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.461070768Z"}
{"log":"I0507 15:53:28.462093 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46802: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.463719816Z"}
这些呢
{"log":"I0507 15:53:29.355463 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44424: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.357218793Z"}
{"log":"I0507 15:53:29.357961 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43128: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358779155Z"}
{"log":"I0507 15:53:29.357993 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43126: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358790397Z"}
{"log":"I0507 15:53:29.405532 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44428: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.406873158Z"}
{"log":"I0507 15:53:29.527221 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43130: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53
并且安装将挂起ansible任务任务[删除web控制台吊舱]
。它会在那里呆上几个小时。当进入masters控制台并在openshift web控制台上运行oc get pods
时,其处于终止
状态。当我描述试图以挂起开始的pod时,它返回时表示硬盘已满。我假设这是因为上面所有这些TLS错误导致它无法与存储系统通信。它只是停留在那里。如果我强制删除终止的pod,然后重新启动主机,然后删除尝试启动的新pod,然后再次重新启动,我可以使集群恢复。然后web控制台联机,但我的所有日志文件都充斥着这些TLS错误。但是,更令人担忧的是安装挂起在那个位置,所以我假设在使web控制台联机之后还有其他步骤也会导致我的问题
因此,我还尝试重新部署服务器CA。这产生了问题,因为我的新证书不是CA证书。然后,当我刚刚运行重新部署CA playbook,让集群重新创建服务器CA时,结果很好,但当我尝试运行重新部署证书.yml
时,我得到了相同的结果
这是我的库存文件
all:
children:
etcd:
hosts:
okdmastertest.labtest.mycompany.com:
masters:
hosts:
okdmastertest.labtest.mycompany.com:
nodes:
hosts:
okdmastertest.labtest.mycompany.com:
openshift_node_group_name: node-config-master-infra
okdnodetest1.labtest.mycompany.com:
openshift_node_group_name: node-config-compute
openshift_schedulable: True
OSEv3:
children:
etcd:
masters:
nodes:
# https://docs.okd.io/latest/install_config/persistent_storage/persistent_storage_glusterfs.html#overview-containerized-glusterfs
# https://github.com/openshift/openshift-ansible/tree/master/playbooks/openshift-glusterfs
# glusterfs:
vars:
openshift_deployment_type: origin
ansible_user: root
openshift_master_cluster_method: native
openshift_master_default_subdomain: apps.labtest.mycompany.com
openshift_install_examples: true
openshift_master_cluster_hostname: console.labtest.mycompany.com
openshift_master_cluster_public_hostname: console.labtest.mycompany.com
openshift_hosted_registry_routehost: registry.apps.labtest.mycompany.com
openshift_certificate_expiry_warning_days: 30
openshift_certificate_expiry_fail_on_warn: false
openshift_master_overwrite_named_certificates: true
openshift_hosted_registry_routetermination: reencrypt
openshift_master_named_certificates:
- certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
names:
- "console.labtest.mycompany.com"
# - "labtest.mycompany.com"
# - "*.labtest.mycompany.com"
# - "*.apps.labtest.mycompany.com"
openshift_hosted_router_certificate:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
openshift_hosted_registry_routecertificates:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
# LDAP auth
openshift_master_identity_providers:
- name: 'mycompany_ldap_provider'
challenge: true
login: true
kind: LDAPPasswordIdentityProvider
attributes:
id:
- dn
email:
- mail
name:
- cn
preferredUsername:
- sAMAccountName
bindDN: 'ldapbind@int.mycompany.com'
bindPassword: (redacted)
insecure: true
url: 'ldap://dc-pa1.int.mycompany.com/ou=mycompany,dc=int,dc=mycompany,dc=com'
我错过了什么?我认为这个重新部署证书。yml
剧本是为了更新证书而设计的。为什么我不能把这个转到我的新商业证书上?这几乎就像是更换路由器上的证书(有点),但在这个过程中,内部服务器证书被弄坏了。我真的在这里束手无策,我不知道还能尝试什么。您应该将openshift\u master\u cluster\u主机名
和openshift\u master\u cluster\u public\u主机名
配置为彼此不同的主机名。
这两个主机名也应该由DNS解析。您的商业证书用作外部访问点
The openshift_master_cluster_public_hostname and openshift_master_cluster_hostname parameters in the Ansible inventory file, by default /etc/ansible/hosts, must be different.
If they are the same, the named certificates will fail and you will need to re-install them.
# Native HA with External LB VIPs
openshift_master_cluster_hostname=internal.paas.example.com
openshift_master_cluster_public_hostname=external.paas.example.com
您最好一步一步地为每个组件配置证书以进行测试。
例如
首先,进行验证。
然后,验证。
等等如果您能够成功完成所有重新部署证书任务,那么最终您可以使用完整的参数运行商业证书维护
有关更多详细信息,请参阅。
我希望它能帮助你。太好了,我会试试这个。我应该可以简单地运行deploy_cluster来更新正确的名称?deploy_cluster.yml
无法在初始安装后更新新主机主机名的证书。如果要更改主机名,请在卸载后重新安装。如果您的商业证书通过路由器
用于pods中的应用程序,您只需重新部署路由器的证书,而无需重新安装。
The openshift_master_cluster_public_hostname and openshift_master_cluster_hostname parameters in the Ansible inventory file, by default /etc/ansible/hosts, must be different.
If they are the same, the named certificates will fail and you will need to re-install them.
# Native HA with External LB VIPs
openshift_master_cluster_hostname=internal.paas.example.com
openshift_master_cluster_public_hostname=external.paas.example.com