我正在尝试在a上设置一个SQLServer业务数据中心,但这个过程似乎没有超过某个特定的点。AKS群集是在Standard_E8_v3 VM ScaleSet上构建的3节点群集。
以下是pod的列表:C:\Users\rgn>kubectl get pods -n mssql-cluster
NAME READY STATUS RESTARTS AGE
control-qm754 3/3 Running 0 35m
controldb-0 2/2 Running 0 35m
controlwd-wxrlg 1/1 Running 0 32m
logsdb-0 1/1 Running 0 32m
logsui-mqfcv 1/1 Running 0 32m
metricsdb-0 1/1 Running 0 32m
metricsdc-9frbb 1/1 Running 0 32m
metricsdc-jr5hk 1/1 Running 0 32m
metricsdc-ls7mf 1/1 Running 0 32m
metricsui-pn9qf 1/1 Running 0 32m
mgmtproxy-x4ctb 2/2 Running 0 32m当我对mgmtproxy-x4ctb pod运行describe时,我看到了以下内容。即使该状态表明它正在运行,但它并没有运行(就绪探测失败)。我相信这就是部署没有进行的原因。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned mssql-cluster/mgmtproxy-x4ctb to aks-agentpool-34156060-vmss000002
Normal Pulling 11m kubelet, aks-agentpool-34156060-vmss000002 Pulling image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
Normal Pulled 11m kubelet, aks-agentpool-34156060-vmss000002 Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
Normal Created 11m kubelet, aks-agentpool-34156060-vmss000002 Created container service-proxy
Normal Started 11m kubelet, aks-agentpool-34156060-vmss000002 Started container service-proxy
Normal Pulling 11m kubelet, aks-agentpool-34156060-vmss000002 Pulling image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
Normal Pulled 11m kubelet, aks-agentpool-34156060-vmss000002 Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
Normal Created 11m kubelet, aks-agentpool-34156060-vmss000002 Created container fluentbit
Normal Started 11m kubelet, aks-agentpool-34156060-vmss000002 Started container fluentbit
Warning Unhealthy 10m (x6 over 11m) kubelet, aks-agentpool-34156060-vmss000002 Readiness probe failed: cat: /var/run/container.ready: No such file or directory我试了两次,但两次都不能超过这一点。从the link上看,这个问题从上个月开始就一直存在。谁能给我指个方向?
来自proxy pod的日志列表:
2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwxr-x' mode: [/var/opt /var/log /var/run/secrets /var/run/secrets/keytabs /var/run/secrets/certificates /var/run/secrets/credentials /var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwx---' mode: [/var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/mgmtproxy.json
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/agent.json
2020/06/13 16:25:35.777955 Changed the container umask from '-----w--w-' to '--------w-'
2020/06/13 16:25:35.778031 Setting the directories for 'supervisor:supervisor' owner with '-rwxrwx---' mode: [/var/log/supervisor/log /var/opt/supervisor /var/log/supervisor /var/run/supervisor]
2020/06/13 16:25:35.778170 Setting the directories for 'fluentbit:fluentbit' owner with '-rwxrwx---' mode: [/var/opt/fluentbit /var/log/fluentbit /var/run/fluentbit]
2020/06/13 16:25:35.778411 Agent configuration: {"PodType":"mgmtproxy","ContainerName":"fluentbit","GrpcPort":8311,"HttpsPort":8411,"ScaledSetKind":"ReplicaSet","securityPolicy":"certificate","dnsServicesToWaitFor":null,"cronJobs":null,"serviceJobs":null,"healthModules":null,"logRotation":{"agentLogMaxSize":500,"agentLogRotateCount":3,"serviceLogRotateCount":10},"fileMap":{"fluentbit-certificate.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem","fluentbit-privatekey.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem","krb5.conf":"/etc/krb5.conf","nsswitch.conf":"/etc/nsswitch.conf","resolv.conf":"/etc/resolv.conf","smb.conf":"/etc/samba/smb.conf"},"userPermissions":{"agent":{"user":"agent","group":"agent","mode":"0770","modeSetgid":false,"directories":[]},"fluentbit":{"user":"fluentbit","group":"","mode":"","modeSetgid":false,"directories":[]},"fundamental":{"user":"agent","group":"agent","mode":"0775","modeSetgid":false,"directories":["/var/opt","/var/log","/var/run/secrets","/var/run/secrets/keytabs","/var/run/secrets/certificates","/var/run/secrets/credentials"]},"supervisor":{"user":"supervisor","group":"supervisor","mode":"0770","modeSetgid":false,"directories":["/var/log/supervisor/log"]}},"fileIgnoreList":["agent-certificate.pem","agent-privatekey.pem"],"InstanceId":"t4KLx1m5vDsHCHc038KgKHH5HOcQVR0Z","ContainerId":"","StartServicesImmediately":false,"DisableFileDownloads":false,"DisableHealthChecks":false,"serviceFencingEnabled":false,"isPrivileged":true,"IsConfigurationManagerEnabled":false,"LWriter":{"filename":"/var/log/agent/agent.log","maxsize":500,"maxage":0,"maxbackups":10,"localtime":true,"compress":false}}
2020/06/13 16:25:36.316209 Attempting to join cluster...
2020/06/13 16:25:36.316301 Source directory /var/opt/secrets/certificates/ca does not exist
2020/06/13 16:25:36.316520 [Reaper] Starting the signal loop for reaper
2020/06/13 16:25:40.642164 [Reaper] Received SIGCHLD signal. Starting process reaper.
2020/06/13 16:25:40.652703 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.943805 Cluster join successful.
2020/06/13 16:25:40.943846 Stopping gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.944704 Getting manifest from controller...
2020/06/13 16:25:40.964774 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-certificate.pem' from controller...
2020/06/13 16:25:40.964816 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-privatekey.pem' from controller...
2020/06/13 16:25:40.987309 Stored 1206 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem
2020/06/13 16:25:40.992108 Stored 1694 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem
2020/06/13 16:25:40.992235 Agent is ready.
2020/06/13 16:25:40.992348 Starting supervisord with command: '[supervisord --nodaemon -c /etc/supervisord.conf]'
2020/06/13 16:25:40.992719 Started supervisord with pid=1437
2020/06/13 16:25:40.993030 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.996580 Starting HTTPS listener on 0.0.0.0:8411
2020/06/13 16:25:41.998667 [READINESS] Not all supervisord processes are ready. Attempts: 1, Max attempts: 250
2020/06/13 16:25:41.999567 Loading go plugin plugins/bdc.so
2020/06/13 16:25:41.999588 Loading go plugin plugins/platform.so
2020/06/13 16:25:41.999600 Starting the health monitoring, number of modules: 2, services: ["fluentbit","agent"]
2020/06/13 16:25:41.999605 Starting the health service
2020/06/13 16:25:41.999609 Starting the health durable store
2020/06/13 16:25:41.999614 Loading existing health properties from /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:41.999642 No existing file path for file: /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:42.640719 Adding a new plugin plugins/bdc.so
2020/06/13 16:25:43.302872 Adding a new plugin plugins/platform.so
2020/06/13 16:25:43.302932 Created a health module watcher for service 'fluentbit'
2020/06/13 16:25:43.302948 Starting a new watcher for health module: fluentbit
2020/06/13 16:25:43.302983 Starting a new watcher for health module: agent
2020/06/13 16:25:43.302992 Health monitoring started
2020/06/13 16:25:53.000908 [READINESS] All services marked as ready.
2020/06/13 16:25:53.000966 [READINESS] Container is now ready.
2020/06/13 16:26:01.995093 [MONITOR] Service states: map[fluentbit:RUNNING]发布于 2020-06-21 06:08:36
全,
最终,它得到了解决。
我们的azure策略和网络策略有几个问题。
(1) It was not allowing new IP addresses to be assigned to the loadbalancer.
(2) The gateway proxy was not getting new IP Addresses since we ran out of our quota of 10 max IPs that were allowed.
(3) My desktop from where I started to deploy was not able to ping the controller service IP addresses and Port.我们一个接一个地解决了上面的问题,我们已经进入了最后阶段。
由于IP地址是静态的,但在运行时生成,因此不能提供它。其他人是如何与他们的网络/azure基础设施团队一起处理这一问题的?
谢谢,rgn
https://stackoverflow.com/questions/62363358
复制相似问题