我有一个32-node (virtual machine) k8s cluster,在使用3个或多个副本创建一个简单的redis-cache部署时,我发现了一些性能问题。第一个POD总是非常快地创建,并在几秒钟内进入RUNNING状态。然而,其余副本的创建花费了1分钟以上。我查看了POD事件logs,看起来创建过程是在Schedule和SuccessfulMountVolume阶段完成之后,但在“拉动”之前完成的。所以"SuccessfulMountVolume“和”拉“之间有很大的时间间隔。有人知道在从回购中“提取”图像之前发生了什么吗?库伯内特斯在这段时间里做了什么,如何调试这类问题?
我使用的是1.9.2版本,下面是我的yaml文件:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: redis-cache
spec:
replicas: 3
template:
metadata:
labels:
app: store
spec:
containers:
- name: redis-server
image: redis:3.2-alpine谢谢,
发布于 2018-04-20 11:08:03
让我们看看管理Pod生命周期的kubelet代码。
下面是一个函数,它调用代码来挂载/附加所有卷,其中一个函数会生成SuccessfulMountVolume事件:
if err := kl.volumeManager.WaitForAttachAndMount(pod); err != nil {
kl.recorder.Eventf(pod, v1.EventTypeWarning, events.FailedMountVolume, "Unable to mount volumes for pod %q: %v", format.Pod(pod), err)
glog.Errorf("Unable to mount volumes for pod %q: %v; skipping pod", format.Pod(pod), err)
return err
}在此之后,kubelet获取拉秘密并实际启动一个Pod:
// Fetch the pull secrets for the pod
pullSecrets := kl.getPullSecretsForPod(pod)
// Call the container runtime's SyncPod callback
result := kl.containerRuntime.SyncPod(pod, apiPodStatus, podStatus, pullSecrets, kl.backOff)现在,让我们在获得带有消息kubelet的事件之后和在Pulling之前找到SuccessfulMountVolume所做的事情。
Mounted。kubelet在同一个周期内创建和更新Pods,所以如果Pod已经存在,它将被杀死,新版本将被创建。正如您所看到的,有许多步骤,其中也包括一些内部步骤之前的体积安装和图像拉。
通过使用kubelet选项将日志级别设置为4(或更多),您可以从-v守护进程获得更多关于内部操作的事件。
希望它能帮助你理解哪一步比你预期的更费时。
发布于 2018-05-23 22:32:14
随着Anton的回答,我发现是网络插件(cni)为我造成了这个问题。我在创建带有副本的豆荚时,通过查看kubelet日志就知道了这一点。从日志中可以清楚地看到创建POD的流程:
I0424 18:32:19.059144 20319 config.go:405] Receiving a new pod "redis-cache-7f9b8ddc49-jzh5l_default(d1f95f27-47ed-11e8-9fd0-005056a1ac67)"
..........
I0424 18:32:19.363560 20319 volume_manager.go:371] All volumes are attached and mounted for pod "redis-cache-7f9b8ddc49-jzh5l_default(d1f95f27-47ed-11e8-9fd0-005056a1ac67)"
I0424 18:32:19.363581 20319 kuberuntime_manager.go:385] No sandbox for pod "redis-cache-7f9b8ddc49-jzh5l_default(d1f95f27-47ed-11e8-9fd0-005056a1ac67)" can be found. Need to start a new one
I0424 18:32:19.363597 20319 kuberuntime_manager.go:571] computePodActions got {KillPod:true CreateSandbox:true SandboxID: Attempt:0 NextInitContainerToStart:nil ContainersToStart:[0] ContainersToKill:map[]} for pod "redis-cache-7f9b8ddc49-jzh5l_default(d1f95f27-47ed-11e8-9fd0-005056a1ac67)"
I0424 18:32:19.363628 20319 kuberuntime_manager.go:580] SyncPod received new pod "redis-cache-7f9b8ddc49-jzh5l_default(d1f95f27-47ed-11e8-9fd0-005056a1ac67)", will create a sandbox for it
I0424 18:32:19.363641 20319 kuberuntime_manager.go:589] Stopping PodSandbox for "redis-cache-7f9b8ddc49-jzh5l_default(d1f95f27-47ed-11e8-9fd0-005056a1ac67)", will start new one
I0424 18:32:19.363655 20319 kuberuntime_manager.go:641] Creating sandbox for pod "redis-cache-7f9b8ddc49-jzh5l_default(d1f95f27-47ed-11e8-9fd0-005056a1ac67)"
I0424 18:32:19.365101 20319 docker_service.go:441] Setting cgroup parent to: "/kubepods/besteffort/podd1f95f27-47ed-11e8-9fd0-005056a1ac67"
I0424 18:32:19.533516 20319 factory.go:112] Using factory "docker" for container "/kubepods/besteffort/podd1f95f27-47ed-11e8-9fd0-005056a1ac67/69cae77dd85e0c66b9c4374fffc04570556404a89146351252b5c4279084d925"
I0424 18:32:19.533709 20319 docker_sandbox.go:658] Will attempt to re-write config file /var/vcap/store/docker/docker/containers/69cae77dd85e0c66b9c4374fffc04570556404a89146351252b5c4279084d925/resolv.conf with:
[nameserver 10.100.200.10 search default.svc.cluster.local svc.cluster.local cluster.local options ndots:5]
I0424 18:32:19.533800 20319 plugins.go:412] Calling network plugin cni to set up pod "redis-cache-7f9b8ddc49-jzh5l_default"在此之后,我看到了cni的一些信息,如下所示:
1 2018-04-24T18:32:19.562Z __main__ Initialized CNI configuration
__main__ ***_cni plugin invoked with arguments: ADD事情在这一点上陷入了困境,几秒钟后,我看到100+返回了沙箱创建所需的网络配置:
2018-04-24T18:34:02.884Z 656626b0-7baf-4d13-9d99-f8fb6635d44f [cni@6876 comp="***" subcomp="***_cni" level="INFO"] __main__ Pod networking configured on container 69cae77dd85e0c66b9c4374fffc04570556404a89146351252b5c4279084d925 (MAC address: 02:50:56:00:00:39, IP address: 30.0.62.4/24)
I0424 18:34:02.888755 20319 kuberuntime_manager.go:655] Created PodSandbox "69cae77dd85e0c66b9c4374fffc04570556404a89146351252b5c4279084d925" for pod "redis-cache-7f9b8ddc49-jzh5l_default(d1f95f27-47ed-11e8-9fd0-005056a1ac67)"我进一步查看了cni日志,并了解到在我的网络堆栈的较低级别上发生了什么。这实际上是由我的路由器和交换机配置引起的。我张贴这个,以防有人遇到类似的问题。
谢谢,
https://stackoverflow.com/questions/49927373
复制相似问题