首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在coreos上运行calico rkt容器时,"EtcdException:无法获得服务器列表“

在coreos上运行calico rkt容器时,"EtcdException:无法获得服务器列表“
EN

Server Fault用户
提问于 2016-09-14 19:50:17
回答 2查看 963关注 0票数 1

我有两台coreos稳定的v1122.2.0机器,每台机器都配置有etcd2。

我使用https://github.com/coreos/etcd/tree/master/hack/tls-setup创建了证书。

现在,我正在尝试配置calico节点,以便使用rkt对我的coreos主节点进行操作。

在云配置配置中有以下内容:

代码语言:javascript
复制
write_files:
 - path: "/etc/kubernetes/cni/net.d/10-calico.conf"
   content: |
     {
     "name": "calico",
     "type": "flannel",
     "delegate": {
         "type": "calico",
         "etcd_endpoints": "https://10.79.218.2:2379,https://10.79.218.3:2379",
         "log_level": "none",
         "log_level_stderr": "info",
         "hostname": "10.79.218.2",
         "policy": {
             "type": "k8s",
             "k8s_api_root": "http://127.0.0.1:8080/api/v1/"
             }
         }
     }
 - path: "/etc/kubernetes/manifests/policy-controller.yaml"
   content: |
    apiVersion: v1
     kind: Pod
     metadata:
       name: calico-policy-controller
       namespace: calico-system
     spec:
       hostNetwork: true
       containers:
         # The Calico policy controller.
         - name: k8s-policy-controller
           image: calico/kube-policy-controller:v0.2.0
           env:
             - name: ETCD_ENDPOINTS
               value: "https://10.79.218.2:2379,https://10.79.218.3:2379"
             - name: K8S_API
               value: "http://127.0.0.1:8080"
             - name: LEADER_ELECTION
               value: "true"
         # Leader election container used by the policy controller.
         - name: leader-elector
           image: quay.io/calico/leader-elector:v0.1.0
           imagePullPolicy: IfNotPresent
           args:
             - "--election=calico-policy-election"
             - "--election-namespace=calico-system"
             - "--http=127.0.0.1:4040"
...
units:
 - name: calico-node.service
   enable: true
   command: start
   content: |
    [Unit]
    Description=Calico per-host agent
    Requires=network-online.target
    After=network-online.target

    [Service]
    Slice=machine.slice
    Environment=CALICO_DISABLE_FILE_LOGGING=true
    Environment=HOSTNAME=10.79.218.2
    Environment=IP=10.79.218.2
    Environment=FELIX_FELIXHOSTNAME=10.79.218.2
    Environment=CALICO_NETWORKING=false
    Environment=NO_DEFAULT_POOLS=true
    Environment=ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379
    ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \
   --volume=modules,kind=host,source=/lib/modules,readOnly=false \
   --mount=volume=modules,target=/lib/modules \
   --trust-keys-from-https quay.io/calico/node:v0.19.0

   KillMode=mixed
   Restart=always
   TimeoutStartSec=0

   [Install]
   WantedBy=multi-user.target

请忽略空格缩进。我认为我没有正确地复制/粘贴它:)

当我尝试启动calico节点服务时,我会得到以下错误:

代码语言:javascript
复制
Sep 14 05:45:17 localhost systemd[1]: Started Calico per-host agent.
Sep 14 05:45:17 localhost rkt[1644]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
Sep 14 05:45:18 localhost rkt[1644]: image: using image from local store for image name quay.io/calico/node:v0.19.0
Sep 14 05:45:25 localhost rkt[1644]: Traceback (most recent call last):
Sep 14 05:45:25 localhost rkt[1644]:   File "startup.py", line 292, in <module>
Sep 14 05:45:25 localhost rkt[1644]:     client = IPAMClient()
Sep 14 05:45:25 localhost rkt[1644]:   File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__
Sep 14 05:45:25 localhost rkt[1644]:     "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca))
Sep 14 05:45:25 localhost rkt[1644]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m
Sep 14 05:45:25 localhost rkt[1644]: Calico node failed to start
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Main process exited, code=exited, status=1/FAILURE
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Unit entered failed state.
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Failed with result 'exit-code'.
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Service hold-off time over, scheduling restart.
Sep 14 05:45:25 localhost systemd[1]: Stopped Calico per-host agent.
Sep 14 05:45:25 localhost systemd[1]: Started Calico per-host agent.
Sep 14 05:45:25 localhost rkt[1714]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
Sep 14 05:45:26 localhost rkt[1714]: image: using image from local store for image name quay.io/calico/node:v0.19.0
Sep 14 05:45:28 localhost rkt[1714]: Traceback (most recent call last):
Sep 14 05:45:28 localhost rkt[1714]:   File "startup.py", line 292, in <module>
Sep 14 05:45:28 localhost rkt[1714]:     client = IPAMClient()
Sep 14 05:45:28 localhost rkt[1714]:   File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__
Sep 14 05:45:28 localhost rkt[1714]:     "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca))
Sep 14 05:45:28 localhost rkt[1714]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m

第2行-25

所以我得到了Invalid ETCD_CA_CERT_FILE.。我并没有真正指定给use..so的键,我想我缺少一些配置。

我在/ etc /ssl/etcd有以下与etc相关的密钥

代码语言:javascript
复制
8 -rw-------. 1 etcd etcd 1050 Sep 14 05:45 ca.pem
8 -rw-------. 1 etcd etcd  289 Sep 14 05:45 etcd1-key.pem
8 -rw-------. 1 etcd etcd 1058 Sep 14 05:45 etcd1.pem
8 -rw-------. 1 etcd etcd  227 Sep 12 03:49 server1-key.pem
8 -rw-------. 1 etcd etcd  822 Sep 12 03:49 server1.pem

我尝试将Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem添加到calico-node文件中,但得到的结果完全相同。

有什么想法吗?

更新

所以我试着手动运行calico,而不是使用systemd。我还添加了calico所需的所有所需的环境变量。

代码语言:javascript
复制
export CALICO_DISABLE_FILE_LOGGING=true
export HOSTNAME=10.79.218.2
export IP=10.79.218.2
export FELIX_FELIXHOSTNAME=10.79.218.2
export CALICO_NETWORKING=false
export NO_DEFAULT_POOLS=true
export ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379
export ETCD_AUTHORITY=10.79.218.2:2379
export ETCD_SCHEME=https
export ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
export ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
export ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem

当我尝试使用以下方法执行calico容器时:

代码语言:javascript
复制
/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \
 --volume=modules,kind=host,source=/lib/modules,readOnly=false \
 --mount=volume=modules,target=/lib/modules \
 --trust-keys-from-https quay.io/calico/node:v0.19.0

我得到了

代码语言:javascript
复制
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
  File "startup.py", line 292, in <module>
   client = IPAMClient()
  File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 221, in __init__
    ETCD_CERT_FILE_ENV, etcd_cert))
pycalico.datastore_errors.DataStoreError: Cannot read ETCD_KEY_FILE and/or ETCD_CERT_FILE. Both must be readable file paths. Values provided: ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem, ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem

我将证书文件的文件权限更改为666,但这不能解决问题。我知道这些证书是有效的,因为etcd tls工作正常。那我错过了什么?

更新2

似乎我没有将证书目录挂载到calico容器上。

所以现在我用

代码语言:javascript
复制
/usr/bin/rkt run --volume etcd-ssl,kind=host,source=/etc/ssl/etcd/,readOnly=true --inherit-env --stage1-from-dir=stage1-fly.aci  --volume=modules,kind=host,source=/lib/modules,readOnly=false  --mount=volume=modules,target=/lib/modules  --trust-keys-from-https quay.io/calico/node:v0.19.0 --mount volume=etcd-ssl,target=/etc/ssl/etcd

我得到以下输出:

代码语言:javascript
复制
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
  File "startup.py", line 292, in <module>
client = IPAMClient()
  File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 246, in __init__
allow_reconnect=True)
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 204, in __init__
set(self.machines))
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 299, in machines
return self.machines
  File "/usr/lib/python2.7/site-packages/etcd/client.py", line 301, in machines
    raise etcd.EtcdException("Could not get the list of servers, "
etcd.EtcdException: Could not get the list of servers, maybe you provided the wrong host(s) to connect to?
Calico node failed to start

我有点靠近了..。但还是没有解决办法。

更新3

我尝试通过运行ETCD_ENDPOINTS将export ETCD_ENDPOINTS=https://10.79.218.2:2379设置为coreos机器上的etcd服务器,而现在当我试图运行calico映像时,我得到了如下结果:

代码语言:javascript
复制
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
  File "startup.py", line 295, in <module>
main()
  File "startup.py", line 251, in main
warn_if_hostname_conflict(ip)
  File "startup.py", line 192, in warn_if_hostname_conflict
current_ipv4, _ = client.get_host_bgp_ips(hostname)
  File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 132, in wrapped
"running?" % (fn.__name__, e.message))
pycalico.datastore_errors.DataStoreError: get_host_bgp_ips: Error accessing etcd (Connection to etcd failed due to SSLError(CertificateError("hostname '10.79.218.2' doesn't match u'etcd'",),)).  Is etcd running?
Calico node failed to start
EN

回答 2

Server Fault用户

回答已采纳

发布于 2016-09-18 05:55:29

我也遇到了这个问题,最终通过查看etcd连接逻辑的代码和使用的库,以及Calico团队在Slack通道中的一些指针,找到了问题的根源。

问题在于,目前版本的Calico (至少0.22.0)使用了Python客户端,它不支持TLS证书中的IP SAN (Subject Alt Name)。这意味着您使用的证书不能正确地与配置它们的etcd服务器相关联。

在这个GitHub问题中描述了这一点。

要解决这个问题,您必须等到urllib库的新版本发布后,再由etcd客户端获取,然后再发布一个新版本,然后更新Calico以使用新的etcd客户端。或者,您可以使用FQDNs而不是SAN字段中的IP地址重新生成证书。这意味着您需要确保您的服务器可以通过这些名称访问,无论是使用DNS还是正确设置/etc/hosts。生成证书的OpenSSL配置应该包含如下内容:

代码语言:javascript
复制
[alt_names]
DNS.1 = $ENV::FQDN

描述如何生成证书的链接使用CFSSL,因此我建议阅读有关如何使用主机名而不是IP地址的文档。我认为它可能与修改JSON配置一样简单,如下所示:

代码语言:javascript
复制
"hosts": [
    "example.com",
    "www.example.com"
],
票数 2
EN

Server Fault用户

发布于 2016-11-16 14:58:12

我发现使用这个古怪的库,如果:客户机打开到IP地址的连接;服务器的证书断言主题中的IP地址;服务器的证书在Subject Alternative Name列表中没有任何DNS类型条目,我就可以成功。下面是openssl x509 -text ...为一个示例服务器证书选择的输出,该示例服务器证书在客户端使用IP地址10.10.10.1打开连接以标识服务器时工作:

代码语言:javascript
复制
...
        Subject: CN=10.10.10.1
...
        X509v3 extensions:
            X509v3 Basic Constraints: 
                CA:FALSE
            X509v3 Key Usage: 
                Digital Signature, Non Repudiation, Key Encipherment
            X509v3 Subject Alternative Name: 
                IP Address:100.127.0.2, IP Address:100.127.0.2, IP Address:10.10.10.1
...

此外,还有更新版本的Calico图像。我只听说过关于calico/node:v0.23.0的两件坏事。一个来自其他人- https://calicousers.slack.com/archives/kubernetes/p1478206011002345。我自己做了一些测试,只处理了一个问题,https://github.com/projectcalico/calico-containers/issues/1107。现在有v1.0.0 betas和一个rc1,我还没有听说过它们的坏处。

票数 0
EN
页面原文内容由Server Fault提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://serverfault.com/questions/803101

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档