首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >kubeadm v1.18.2,crio版本1.18.2未能从Centos7 / RH7上的私有回购启动主节点

kubeadm v1.18.2,crio版本1.18.2未能从Centos7 / RH7上的私有回购启动主节点
EN

Stack Overflow用户
提问于 2020-07-01 10:57:12
回答 1查看 2.2K关注 0票数 1

描述

我对库伯内特斯相对来说是个新手。当我使用默认套接字(/var/ run /dockershim.sock)时,我可以运行我的集群,但是当我尝试crio套接字从我的私有回购中提取映像时,我注意到速度甚至不能与之相比。

我正在尝试将我的所有节点配置为使用crio.socket,但是我未能使用这个套接字启动主节点。

我跟踪了来自kubernetes 使用kubeadm配置集群中的每个kubelet和git文档克里奥的文档。

不幸的是,我无法使它工作,因为它似乎是忽视私人回购标志。

复制问题的步骤:

  1. 使用以下init (使用私有回购)启动主节点(主节点):
代码语言:javascript
复制
kubeadm init \
        --upload-certs \
        --cri-socket=/var/run/crio/crio.sock \ 
        --node-name=my_node_name \
        --image-repository=my.private.repo \
        --pod-network-cidr=10.96.0.0/16 \
        --kubernetes-version=v1.18.2 \
        --control-plane-endpoint=ip:6443 \
        --apiserver-cert-extra-sans=ip \
        --apiserver-advertise-address=ip
  1. 以root或sudo运行:journalctl -xeu crio -f
  2. 在调试或信息模式下观察下面的日志示例

描述您收到的结果:

调试模式下crio中的日志示例:

代码语言:javascript
复制
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043499089+02:00" level=debug msg="Trying to access \"k8s.gcr.io/pause:3.2\"" file="docker/docker_image_src.go:68"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043547722+02:00" level=debug msg="Credentials not found" file="config/config.go:123"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043576124+02:00" level=debug msg="Using registries.d directory /etc/containers/registries.d for sigstore configuration" file="docker/lookaside.go:51"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043706369+02:00" level=debug msg=" Using \"default-docker\" configuration" file="docker/lookaside.go:169"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043736378+02:00" level=debug msg=" No signature storage configuration found for k8s.gcr.io/pause:3.2" file="docker/lookaside.go:174"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043769424+02:00" level=debug msg="Looking for TLS certificates and private keys in /etc/docker/certs.d/k8s.gcr.io" file="tlsclientconfig/tlsclientconfig.go:21"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.043858410+02:00" level=debug msg="GET https://k8s.gcr.io/v2/" file="docker/docker_client.go:516"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.046154250+02:00" level=debug msg="Ping https://k8s.gcr.io/v2/ err Get \"https://k8s.gcr.io/v2/\": dial tcp 10.254.3.15:443: connect: connection refused (&url.Error{Op:\"Get\", URL:\"https://k8s.gcr.io/v2/\", Err:(*net.OpError)(0xc00084d5e0)})" file="docker/docker_client.go:708"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.046239456+02:00" level=debug msg="GET https://k8s.gcr.io/v1/_ping" file="docker/docker_client.go:516"
Jun 30 20:03:45 hostname crio[6693]: time="2020-06-30 20:03:45.048653448+02:00" level=debug msg="Ping https://k8s.gcr.io/v1/_ping err Get \"https://k8s.gcr.io/v1/_ping\": dial tcp 10.254.3.15:443: connect: connection refused (&url.Error{Op:\"Get\", URL:\"https://k8s.gcr.io/v1/_ping\", Err:(*net.OpError)(0xc0006b0690)})" file="docker/docker_client.go:735"

描述了您期望的结果:

使用crio套接字启动节点

您认为重要的附加信息(例如,仅偶尔发生问题):

如果我使用默认套接字启动节点,例如:

代码语言:javascript
复制
# kubeadm init \
        --upload-certs \
        --cri-socket=/var/run/dockershim.sock \ 
        --node-name=my_node_name \
        --image-repository=my.private.repo \
        --pod-network-cidr=10.96.0.0/16 \
        --kubernetes-version=v1.18.2 \
        --control-plane-endpoint=ip:6443 \
        --apiserver-cert-extra-sans=ip \
        --apiserver-advertise-address=ip
W0630 20:24:33.223266   29033 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0630 20:24:35.839949   29033 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0630 20:24:35.841420   29033 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 11.003647 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
key
[mark-control-plane] Marking the node hostname as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node hostname as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: token
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join ip:6443 --token token \
    --discovery-token-ca-cert-hash sha256:hash \
    --control-plane --certificate-key key

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join ip:6443 --token token \
    --discovery-token-ca-cert-hash sha256:hash

如果使用crio套接字启动节点:

代码语言:javascript
复制
# kubeadm init \
        --upload-certs \
        --cri-socket=/var/run/crio/crio.sock \ 
        --node-name=my_node_name \
        --image-repository=my.private.repo \
        --pod-network-cidr=10.96.0.0/16 \
        --kubernetes-version=v1.18.2 \
        --control-plane-endpoint=ip:6443 \
        --apiserver-cert-extra-sans=ip \
        --apiserver-advertise-address=ip
W0630 20:32:33.827957    2916 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [hostname kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.96.134.57 10.96.134.57 10.96.134.57]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [hostname localhost] and IPs [10.96.134.57 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [hostname localhost] and IPs [10.96.134.57 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0630 20:32:37.829806    2916 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0630 20:32:37.830826    2916 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
                - 'crictl --runtime-endpoint /var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'crictl --runtime-endpoint /var/run/crio/crio.sock logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

我可以看到本地主机正在监听端口10248:

代码语言:javascript
复制
# curl -sSL http://localhost:10248/healthz
ok

crio套接字示例(如文档中所述):

代码语言:javascript
复制
# curl -v --unix-socket /var/run/crio/crio.sock http://localhost/info | jq
* About to connect() to localhost port 80 (#0)
*   Trying /var/run/crio/crio.sock...
* Failed to set TCP_KEEPIDLE on fd 3
* Failed to set TCP_KEEPINTVL on fd 3
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to localhost (/var/run/crio/crio.sock) port 80 (#0)
> GET /info HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Tue, 30 Jun 2020 18:36:35 GMT
< Content-Length: 240
<
{ [data not shown]
100   240  100   240    0     0   144k      0 --:--:-- --:--:-- --:--:--  234k
* Connection #0 to host localhost left intact
{
  "storage_driver": "overlay2",
  "storage_root": "/var/lib/containers/storage",
  "cgroup_driver": "systemd",
  "default_id_mappings": {
    "uids": [
      {
        "container_id": 0,
        "host_id": 0,
        "size": 4294967295
      }
    ],
    "gids": [
      {
        "container_id": 0,
        "host_id": 0,
        "size": 4294967295
      }
    ]
  }
}

kubelet status输出

代码语言:javascript
复制
# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Tue 2020-06-30 20:39:49 CEST; 6s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 8502 (kubelet)
    Tasks: 15
   Memory: 20.1M
   CGroup: /system.slice/kubelet.service
           └─8502 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --hostname-override=hostname

Jun 30 20:39:55 hostname kubelet[8502]: I0630 20:39:55.369441    8502 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jun 30 20:39:55 hostname kubelet[8502]: I0630 20:39:55.399015    8502 kubelet_node_status.go:70] Attempting to register node hostname
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.403707    8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.503871    8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.604115    8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.704324    8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.769448    8502 kubelet_node_status.go:92] Unable to register node "hostname" with API server: Post https://ip:6443/api/v1/nodes: dial tcp ip:6443: connect: connection refused
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.805779    8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:55 hostname kubelet[8502]: E0630 20:39:55.906014    8502 kubelet.go:2267] node "hostname" not found
Jun 30 20:39:56 hostname kubelet[8502]: E0630 20:39:56.007272    8502 kubelet.go:2267] node "hostname" not found

从我所知的一点来看,网络错误并不相关,因为我还没有启动网络容器,所以在这一点上应该会出现错误。

crio --version**:**的输出

代码语言:javascript
复制
# crio --version
crio version 1.18.2
Version:       1.18.2
GitCommit:     7f261aeebffed079b4475dde8b9d602b01973d33
GitTreeState:  clean
BuildDate:     2020-06-18T21:05:27Z
GoVersion:     go1.14
Compiler:      gc
Platform:      linux/amd64
Linkmode:      static

kubelet --version**:**的输出

代码语言:javascript
复制
# kubelet --version
Kubernetes v1.18.2

LinuxOS version**:**的输出

代码语言:javascript
复制
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)

其他环境详细信息(AWS、VirtualBox、物理等):

安装在barebone节点上。

kubelet文件示例

代码语言:javascript
复制
# cat /etc/default/kubelet
KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m

更新:--我已经在github Kubernetes v1.18.2与crio版本1.18.2未能在RH7 #3915上与kubelet同步上开了一张票。看起来存在一个bug,因为cri-o无法处理远程存储库,而且它正在尝试提取默认的回购k8s.io。一旦我有更多的信息,我会尽快更新机票。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-20 15:05:13

因此,问题不像我们最初认为的那样是CRI-O上的一个bug (也是CRI-O开发团队),但如果用户希望使用CRI-O作为CRI for kubernetes,并且也希望使用私有回购,那么似乎需要应用许多配置。

所以我不会把CRI的配置放在这里,因为它已经记录在我向团队Kubernetes v1.18.2与crio版本1.18.2未能在RH7 #3915上与kubelet同步提出的票据上。

某人应该应用的第一种配置是配置容器的注册中心,以便提取图像:

代码语言:javascript
复制
$ cat /etc/containers/registries.conf
[[registry]]
prefix = "k8s.gcr.io"
insecure = false
blocked = false
location = "k8s.gcr.io"

[[registry.mirror]]
location = "my.private.repo"

CRI-O建议将此配置作为标志传递给kubelet (头发指挥官/cri o-kubeadm),但对我来说,它并不仅用于此配置。

我回到了kubernetes手册,建议在运行时不要将标志传递给kubelet,而是传递给文件/var/lib/kubelet/config.yaml。对我来说,这是不可能的,因为节点需要从CRI套接字开始,而不是任何其他套接字(ref 在控制平面节点上配置kubelet使用的cgroup驱动程序)。

因此,通过在下面的配置文件示例中传递此标志,我成功地启动并运行了它:

代码语言:javascript
复制
$ cat /tmp/config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/crio/crio.sock
  name: node.name
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
controlPlaneEndpoint: 1.2.3.4:6443
imageRepository: my.private.repo
kind: ClusterConfiguration
kubernetesVersion: v1.18.2
networking:
  dnsDomain: cluster.local
  podSubnet: 10.85.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd

然后,用户只需使用标志--config <file.yml>启动主/工作者节点,节点就会成功启动。

希望这里的所有信息都能帮助到其他人。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62675268

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档