我有一个通过kubeadm安装kubernetes集群的过程,它多次工作。
由于某种原因,我现在安装了一个集群,并且由于某种原因,节点在通信时遇到了问题。
问题主要体现在以下几个方面:有时集群无法解决全局dns记录,例如mirrorlist.centos.org,有时某个特定节点的一个pod与不同节点中的另一个端没有连接。
我的kubernetes版本是1.9.2,我的主机是centOS 7.4,我在0.9.1版中使用法兰绒作为cni插件,我的集群是建立在AWS上的。
到目前为止,mt调试是:
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' -查看子网10.244.0.0/24 10.244.1.0/24
1. I tried installing busybox and ding nslookup to cluster kubernetes.default and it only works of busybox is on the same node as the dns ( tried this link [https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/)
我甚至尝试从其他运行环境创建一个AMI,并将其作为节点部署到这个集群中,但仍然失败。
我试着检查是否有某个端口丢失,所以我甚至在节点之间打开了所有端口。
我还禁用了iptables和防火墙以及所有节点,以确保这不是原因
什么都帮不上忙。
任何小费都能帮上忙
编辑:我添加了法兰绒配置:
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"type": "flannel",
"delegate": {
"isDefaultGateway": true
}
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-system
labels:
tier: node
app: flannel
spec:
template:
metadata:
labels:
tier: node
app: flannel
spec:
hostNetwork: true
nodeSelector:
beta.kubernetes.io/arch: amd64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.9.1-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conf
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.9.1-amd64
command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
securityContext:
privileged: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg发布于 2018-10-21 13:04:16
问题是AWS机器不是由我提供的,提供这些机器的团队保证所有内部通信都是打开的。
在使用nmap进行了大量调试之后,我发现UDP端口没有打开,而且由于法兰绒需要UDP通信,通信不能正常工作。
一旦UDP被打开,问题就解决了。
https://stackoverflow.com/questions/52799982
复制相似问题