我尝试在kubernetes上部署Pachyderm (一个docker bigdata平台)。受Pachyderm的限制,我必须安装kubernetes v1.2.2,这是一个旧版本。我遵循这里的指南,通过http://kubernetes.io/docs/getting-started-guides/docker/在本地服务器上部署Kubernetes。该指南可以与kubernetes >=1.3.0一起使用,但当我使用它部署kubernetes 1.2.2时,我遇到了一些问题。
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ec38ae951f09 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube apiserver" 8 seconds ago Exited (255) 7 seconds ago k8s_apiserver.78ec1de_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_d26fc24e
55c1b13bb610 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 8 seconds ago Up 8 seconds k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_1cb4c220
b9f0e5b3a7a9 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube scheduler" 9 seconds ago Up 8 seconds k8s_scheduler.fc12fcbe_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_e5065506
9cd613d272bc gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube apiserver" 9 seconds ago Exited (255) 8 seconds ago k8s_apiserver.78ec1de_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_c04426af
49fe2c409386 gcr.io/google_containers/etcd:2.2.1 "/usr/local/bin/etcd " 10 seconds ago Up 9 seconds k8s_etcd.7e452b0b_k8s-etcd-127.0.0.1_default_1df6a8b4d6e129d5ed8840e370203c11_a6f11fdb
5b208be18c71 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube controlle" 10 seconds ago Up 9 seconds k8s_controller-manager.70414b65_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_c377c5e9
df194f3cf663 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube proxy --m" 10 seconds ago Up 9 seconds k8s_kube-proxy.9a9f4853_k8s-proxy-127.0.0.1_default_5e5303a9d49035e9fad52bfc4c88edc8_63ec0b04
58b53ec28fbe gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-etcd-127.0.0.1_default_1df6a8b4d6e129d5ed8840e370203c11_21034b2e
df48fe4cdf0a gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_4867dbbc
fe6b74c2a881 gcr.io/google_containers/pause:2.0 "/pause" 10 seconds ago Up 9 seconds k8s_POD.6059dfa2_k8s-proxy-127.0.0.1_default_5e5303a9d49035e9fad52bfc4c88edc8_fad2c558
4c00ad498916 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/hyperkube kubelet -" 25 seconds ago Up 24 seconds kubelet从docker容器表中可以观察到,在部署kubernetes1.2.2时,我的apiserver关闭了。kubernetes apiserver的重新启动间隔服从指数退避算法。但从来没有用过。
然后,
sv: batch/v1
mv: extensions/__internal
I0727 06:06:27.593708 1 genericapiserver.go:82] Adding storage destination for group batch
W0727 06:06:27.593745 1 server.go:383] No RSA key provided, service account token authentication disabled
F0727 06:06:27.593767 1 server.go:410] Invalid Authentication Config: open /srv/kubernetes/basic_auth.csv: no such file or directory请在这里查看kubernetes apiserver的docker日志。请注意,出现了一些身份验证错误,Kubernetes似乎没有permitted.Also所需的密钥,请参阅控制器管理器日志。控制器管理器等待apiserver,但是apiserver从未运行过。控制器管理器也是转储的。
E0727 06:07:10.604801 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:11.604832 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:12.604752 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:13.604803 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604332 1 nodecontroller.go:229] Error monitoring node status: Get http://127.0.0.1:8080/api/v1/nodes: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604619 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
E0727 06:07:14.604861 1 controllermanager.go:259] Failed to get api versions from server: Get http://127.0.0.1:8080/api: dial tcp 127.0.0.1:8080: connection refused
F0727 06:07:14.604957 1 controllermanager.go:263] Failed to get api versions from server: timed out waiting for the condition那么对于我的问题,如何解决这个问题呢?这个问题困扰了我很长一段时间。
====================================================================更新:
在Goblin和Lukie的帮助下,我发现关键问题是Setup Pods没有被触发。查看Kubernetes的清单,
{
"name": "controller-manager",
"/hyperkube",
"controller-manager",
"--master=127.0.0.1:8080",
"--service-account-private-key-file=/srv/kubernetes/server.key",
"--root-ca-file=/srv/kubernetes/ca.crt",
"--min-resync-period=3m",
"--v=2"
],
"volumeMounts": [
{
"name": "data",
"mountPath": "/srv/kubernetes"
}
]
}选项--service-account-private-key-file=/srv/kubernetes/server.key已添加到清单文件中,但它不起作用。换句话说,控制器管理器无法在文件系统中找到该文件。下面的命令支持这一假设。
docker exec a82d7f6e4d7d ls -l /srv/kubernetes
ls: cannot access /srv/kubernetes: No such file or directory接下来,我们检查Setup Pod是否将文件放入docker卷中。不幸的是,我们发现Setup Pod没有被触发和工作,因此没有在文件系统中写入证书文件。
docker ps -a | grep setup
54afdd81349e gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" About a minute ago Up About a minute k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_a2edddca
6f714e034098 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 4 minutes ago Exited (7) 2 minutes ago k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_0d7dab5b
8358f6644d94 gcr.io/google_containers/hyperkube-amd64:v1.2.2 "/setup-files.sh IP:1" 6 minutes ago Exited (7) 4 minutes ago k8s_setup.e5aa3216_k8s-master-127.0.0.1_default_4c6ab43ac4ee970e1f563d76ab3d3ec9_41e4c686有什么方法可以做进一步的调试吗?或者它是Kubernetes版本1.2中的一个bug?
发布于 2016-07-27 16:02:51
F0727 06:06:27.593767 1 server.go:410] Invalid Authentication Config: open /srv/kubernetes/basic_auth.csv: no such file or directory您缺少基本身份验证文件/srv/kubernetes/basic_auth.csv,请创建基本身份验证文件或删除配置标志。
Kubernetes authentication
发布于 2016-07-27 16:26:20
事实上,在我看来,W0727 06:06:27.593745 1 server.go:383] No RSA key provided, service account token authentication disabled更重要。
控制器管理器上似乎缺少--service-account-private-key-file,因此无法正确生成服务令牌。
https://stackoverflow.com/questions/38606503
复制相似问题