在安装群集时,我使用了来自内部CA权限的自签名证书。在我开始从部署到OKD集群的应用程序中获得证书错误之前,一切都很好。我们决定不再每次只修复一个错误,而只是购买一个商业证书并安装它。因此,我们从GlobalSign购买了一个带有通配符(与我们最初从内部CA获得的通配符相同)的SAN证书,我正试图用巨大的问题安装它。
请记住,我在这里已经尝试了数十次迭代。我只是记录了我最后一次尝试,试图找出到底是什么问题。这是在我的测试集群上,它是一个VM服务器,我在每个服务器之后恢复到快照。快照是使用内部CA证书的操作群集。
因此,我的第一步是构建要传递的CAfile。我下载了GlobalSign的根证书和中间证书,并将它们放在ca-globalsign.crt文件中。(PEM格式)
当我跑的时候
openssl verify -CAfile ../ca-globalsign.crt labtest.mycompany.com.pem我得到:
labtest.mycompany.com.pem: OKopenssl x509 -in labtest.mycompany.com.pem -text -noout给了我(编辑)
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
(redacted)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=BE, O=GlobalSign nv-sa, CN=GlobalSign Organization Validation CA - SHA256 - G2
Validity
Not Before: Apr 29 16:11:07 2019 GMT
Not After : Apr 29 16:11:07 2020 GMT
Subject: C=US, ST=(redacted), L=(redacted), OU=Information Technology, O=(redacted), CN=labtest.mycompany.com
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
(redacted)
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
Authority Information Access:
CA Issuers - URI:http://secure.globalsign.com/cacert/gsorganizationvalsha2g2r1.crt
OCSP - URI:http://ocsp2.globalsign.com/gsorganizationvalsha2g2
X509v3 Certificate Policies:
Policy: 1.3.6.1.4.1.4146.1.20
CPS: https://www.globalsign.com/repository/
Policy: 2.23.140.1.2.2
X509v3 Basic Constraints:
CA:FALSE
X509v3 Subject Alternative Name:
DNS:labtest.mycompany.com, DNS:*.labtest.mycompany.com, DNS:*.apps.labtest.mycompany.com
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
(redacted)
X509v3 Authority Key Identifier:
(redacted)
(redacted)在我的本地机器上。我所知道的关于SSL的一切都表明证书是安全的。这些新的文件被放在我用来保存我的OKD安装的托拉斯和诸如此类的项目中。
然后,我更新了我的ansible inventory项目中的cert文件,并运行命令
ansible-playbook -i ../okd_install/inventory/okd_labtest_inventory.yml playbooks/redeploy-certificates.yml当我阅读文档时,所有的东西都告诉我,它应该通过它的过程滚动,并提出新的证书。这种事不会发生的。当我在库存文件中使用openshift_master_overwrite_named_certificates: false时,安装完成,但它只替换了*.apps.labtest域中的cert,但是console.labtest保持原来的状态,但它确实联机,除了监视显示集群控制台中的bad gateway之外。
现在,如果我再次尝试运行这个命令,使用openshift_master_overwrite_named_certificates: true我的/var/log/containers/master-api*.log就会被这样的错误淹没
{"log":"I0507 15:53:28.451851 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46796: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.451894391Z"}
{"log":"I0507 15:53:28.455218 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46798: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.455272658Z"}
{"log":"I0507 15:53:28.458742 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46800: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.461070768Z"}
{"log":"I0507 15:53:28.462093 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46802: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.463719816Z"}而这些
{"log":"I0507 15:53:29.355463 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44424: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.357218793Z"}
{"log":"I0507 15:53:29.357961 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43128: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358779155Z"}
{"log":"I0507 15:53:29.357993 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43126: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358790397Z"}
{"log":"I0507 15:53:29.405532 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44428: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.406873158Z"}
{"log":"I0507 15:53:29.527221 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43130: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53安装挂在ansible任务TASK [Remove web console pods]上。它会在那里坐上几个小时。当进入主控制台并在oc get pods上运行openshift-web-console时,its处于terminating状态。当我描述试图从pending开始的豆荚时,它又回来说硬盘已经满了。我假设这是因为由于上面所有的TLS错误,它无法与存储系统通信。它就呆在那里。如果强制删除终止结束符,然后重新启动主程序,然后删除尝试启动的新吊舱,然后再重新启动第二次,则可以将集群重新启动。然后,网络控制台上线,但我所有的日志文件都充斥着那些TLS错误。但是,更令人关注的是安装挂在那个位置,所以我假设在将web控制台联机之后还有其他步骤会给我带来麻烦。
因此,我还尝试重新部署服务器CA。这会产生问题,因为我的新证书不是CA证书。然后,当我运行重新部署CA剧本时,为了让集群重新创建服务器CA,它完成得很好,但是当我尝试运行redeploy-certificates.yml时,我得到了相同的结果。
这是我的库存文件
all:
children:
etcd:
hosts:
okdmastertest.labtest.mycompany.com:
masters:
hosts:
okdmastertest.labtest.mycompany.com:
nodes:
hosts:
okdmastertest.labtest.mycompany.com:
openshift_node_group_name: node-config-master-infra
okdnodetest1.labtest.mycompany.com:
openshift_node_group_name: node-config-compute
openshift_schedulable: True
OSEv3:
children:
etcd:
masters:
nodes:
# https://docs.okd.io/latest/install_config/persistent_storage/persistent_storage_glusterfs.html#overview-containerized-glusterfs
# https://github.com/openshift/openshift-ansible/tree/master/playbooks/openshift-glusterfs
# glusterfs:
vars:
openshift_deployment_type: origin
ansible_user: root
openshift_master_cluster_method: native
openshift_master_default_subdomain: apps.labtest.mycompany.com
openshift_install_examples: true
openshift_master_cluster_hostname: console.labtest.mycompany.com
openshift_master_cluster_public_hostname: console.labtest.mycompany.com
openshift_hosted_registry_routehost: registry.apps.labtest.mycompany.com
openshift_certificate_expiry_warning_days: 30
openshift_certificate_expiry_fail_on_warn: false
openshift_master_overwrite_named_certificates: true
openshift_hosted_registry_routetermination: reencrypt
openshift_master_named_certificates:
- certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
names:
- "console.labtest.mycompany.com"
# - "labtest.mycompany.com"
# - "*.labtest.mycompany.com"
# - "*.apps.labtest.mycompany.com"
openshift_hosted_router_certificate:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
openshift_hosted_registry_routecertificates:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
# LDAP auth
openshift_master_identity_providers:
- name: 'mycompany_ldap_provider'
challenge: true
login: true
kind: LDAPPasswordIdentityProvider
attributes:
id:
- dn
email:
- mail
name:
- cn
preferredUsername:
- sAMAccountName
bindDN: 'ldapbind@int.mycompany.com'
bindPassword: (redacted)
insecure: true
url: 'ldap://dc-pa1.int.mycompany.com/ou=mycompany,dc=int,dc=mycompany,dc=com'我在这里错过了什么?我认为这个redeploy-certificates.yml剧本是为了更新证书而设计的。为什么我不能把这个拿到我的新的商业证书?这几乎就像它替换了路由器上的证书一样(有点),但是在这个过程中,内部服务器证书被搞砸了。我真的在这里结束了,我不知道还能尝试什么。
发布于 2019-05-08 14:02:02
您应该将openshift_master_cluster_hostname和openshift_master_cluster_public_hostname配置为不同的主机名。这两个主机名也应该由DNS解析。您的商业证书被用作外部访问点。
The openshift_master_cluster_public_hostname and openshift_master_cluster_hostname parameters in the Ansible inventory file, by default /etc/ansible/hosts, must be different.
If they are the same, the named certificates will fail and you will need to re-install them.
# Native HA with External LB VIPs
openshift_master_cluster_hostname=internal.paas.example.com
openshift_master_cluster_public_hostname=external.paas.example.com您最好一步一步地配置每个组件的证书,以便进行测试。例如,首先,配置自定义主主机证书,并验证。然后,为默认路由器配置自定义通配符证书,并验证。诸若此类。如果您可以成功地完成所有重新部署证书的任务,那么最后您可以使用完整的参数运行您的商业证书维护。
有关更多细节,请参阅配置自定义证书。希望它能帮到你。
https://stackoverflow.com/questions/56029991
复制相似问题