文章/答案/技术大牛

发布

社区首页 >问答首页 >管理代理生产上的就绪探测失败

问管理代理生产上的就绪探测失败
EN

Stack Overflow用户

提问于 2020-06-14 01:26:50

回答 1查看 190关注 0票数 0

我正在尝试在a上设置一个SQLServer业务数据中心，但这个过程似乎没有超过某个特定的点。AKS群集是在Standard_E8_v3 VM ScaleSet上构建的3节点群集。

以下是pod的列表：C:\Users\rgn>kubectl get pods -n mssql-cluster

NAME              READY   STATUS    RESTARTS   AGE
control-qm754     3/3     Running   0          35m
controldb-0       2/2     Running   0          35m
controlwd-wxrlg   1/1     Running   0          32m
logsdb-0          1/1     Running   0          32m
logsui-mqfcv      1/1     Running   0          32m
metricsdb-0       1/1     Running   0          32m
metricsdc-9frbb   1/1     Running   0          32m
metricsdc-jr5hk   1/1     Running   0          32m
metricsdc-ls7mf   1/1     Running   0          32m
metricsui-pn9qf   1/1     Running   0          32m
mgmtproxy-x4ctb   2/2     Running   0          32m

当我对mgmtproxy-x4ctb pod运行describe时，我看到了以下内容。即使该状态表明它正在运行，但它并没有运行(就绪探测失败)。我相信这就是部署没有进行的原因。

Events:
  Type     Reason     Age                From                                        Message
  ----     ------     ----               ----                                        -------
  Normal   Scheduled  11m                default-scheduler                           Successfully assigned mssql-cluster/mgmtproxy-x4ctb to aks-agentpool-34156060-vmss000002
  Normal   Pulling    11m                kubelet, aks-agentpool-34156060-vmss000002  Pulling image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
  Normal   Pulled     11m                kubelet, aks-agentpool-34156060-vmss000002  Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-service-proxy:2019-CU4-ubuntu-16.04"
  Normal   Created    11m                kubelet, aks-agentpool-34156060-vmss000002  Created container service-proxy
  Normal   Started    11m                kubelet, aks-agentpool-34156060-vmss000002  Started container service-proxy
  Normal   Pulling    11m                kubelet, aks-agentpool-34156060-vmss000002  Pulling image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
  Normal   Pulled     11m                kubelet, aks-agentpool-34156060-vmss000002  Successfully pulled image "mcr.microsoft.com/mssql/bdc/mssql-monitor-fluentbit:2019-CU4-ubuntu-16.04"
  Normal   Created    11m                kubelet, aks-agentpool-34156060-vmss000002  Created container fluentbit
  Normal   Started    11m                kubelet, aks-agentpool-34156060-vmss000002  Started container fluentbit
  Warning  Unhealthy  10m (x6 over 11m)  kubelet, aks-agentpool-34156060-vmss000002  Readiness probe failed: cat: /var/run/container.ready: No such file or directory

我试了两次，但两次都不能超过这一点。从the link上看，这个问题从上个月开始就一直存在。谁能给我指个方向？

来自proxy pod的日志列表：

2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwxr-x' mode: [/var/opt /var/log /var/run/secrets /var/run/secrets/keytabs /var/run/secrets/certificates /var/run/secrets/credentials /var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Setting the directories for 'agent:agent' owner with '-rwxrwx---' mode: [/var/opt/agent /var/log/agent /var/run/agent]
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/mgmtproxy.json
2020/06/13 16:25:35 Searching agent configuration file at /opt/agent/conf/agent.json
2020/06/13 16:25:35.777955 Changed the container umask from '-----w--w-' to '--------w-'
2020/06/13 16:25:35.778031 Setting the directories for 'supervisor:supervisor' owner with '-rwxrwx---' mode: [/var/log/supervisor/log /var/opt/supervisor /var/log/supervisor /var/run/supervisor]
2020/06/13 16:25:35.778170 Setting the directories for 'fluentbit:fluentbit' owner with '-rwxrwx---' mode: [/var/opt/fluentbit /var/log/fluentbit /var/run/fluentbit]
2020/06/13 16:25:35.778411 Agent configuration: {"PodType":"mgmtproxy","ContainerName":"fluentbit","GrpcPort":8311,"HttpsPort":8411,"ScaledSetKind":"ReplicaSet","securityPolicy":"certificate","dnsServicesToWaitFor":null,"cronJobs":null,"serviceJobs":null,"healthModules":null,"logRotation":{"agentLogMaxSize":500,"agentLogRotateCount":3,"serviceLogRotateCount":10},"fileMap":{"fluentbit-certificate.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem","fluentbit-privatekey.pem":"/var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem","krb5.conf":"/etc/krb5.conf","nsswitch.conf":"/etc/nsswitch.conf","resolv.conf":"/etc/resolv.conf","smb.conf":"/etc/samba/smb.conf"},"userPermissions":{"agent":{"user":"agent","group":"agent","mode":"0770","modeSetgid":false,"directories":[]},"fluentbit":{"user":"fluentbit","group":"","mode":"","modeSetgid":false,"directories":[]},"fundamental":{"user":"agent","group":"agent","mode":"0775","modeSetgid":false,"directories":["/var/opt","/var/log","/var/run/secrets","/var/run/secrets/keytabs","/var/run/secrets/certificates","/var/run/secrets/credentials"]},"supervisor":{"user":"supervisor","group":"supervisor","mode":"0770","modeSetgid":false,"directories":["/var/log/supervisor/log"]}},"fileIgnoreList":["agent-certificate.pem","agent-privatekey.pem"],"InstanceId":"t4KLx1m5vDsHCHc038KgKHH5HOcQVR0Z","ContainerId":"","StartServicesImmediately":false,"DisableFileDownloads":false,"DisableHealthChecks":false,"serviceFencingEnabled":false,"isPrivileged":true,"IsConfigurationManagerEnabled":false,"LWriter":{"filename":"/var/log/agent/agent.log","maxsize":500,"maxage":0,"maxbackups":10,"localtime":true,"compress":false}}
2020/06/13 16:25:36.316209 Attempting to join cluster...
2020/06/13 16:25:36.316301 Source directory /var/opt/secrets/certificates/ca does not exist
2020/06/13 16:25:36.316520 [Reaper] Starting the signal loop for reaper
2020/06/13 16:25:40.642164 [Reaper] Received SIGCHLD signal. Starting process reaper.
2020/06/13 16:25:40.652703 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.943805 Cluster join successful.
2020/06/13 16:25:40.943846 Stopping gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.944704 Getting manifest from controller...
2020/06/13 16:25:40.964774 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-certificate.pem' from controller...
2020/06/13 16:25:40.964816 Downloading '/config/scaledsets/mgmtproxy/containers/fluentbit/files/fluentbit-privatekey.pem' from controller...
2020/06/13 16:25:40.987309 Stored 1206 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-certificate.pem
2020/06/13 16:25:40.992108 Stored 1694 bytes to /var/run/secrets/certificates/fluentbit/fluentbit-privatekey.pem
2020/06/13 16:25:40.992235 Agent is ready.
2020/06/13 16:25:40.992348 Starting supervisord with command: '[supervisord --nodaemon -c /etc/supervisord.conf]'
2020/06/13 16:25:40.992719 Started supervisord with pid=1437
2020/06/13 16:25:40.993030 Starting secure gRPC listener on 0.0.0.0:8311
2020/06/13 16:25:40.996580 Starting HTTPS listener on 0.0.0.0:8411
2020/06/13 16:25:41.998667 [READINESS] Not all supervisord processes are ready. Attempts: 1, Max attempts: 250
2020/06/13 16:25:41.999567 Loading go plugin plugins/bdc.so
2020/06/13 16:25:41.999588 Loading go plugin plugins/platform.so
2020/06/13 16:25:41.999600 Starting the health monitoring, number of modules: 2, services: ["fluentbit","agent"]
2020/06/13 16:25:41.999605 Starting the health service
2020/06/13 16:25:41.999609 Starting the health durable store
2020/06/13 16:25:41.999614 Loading existing health properties from /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:41.999642 No existing file path for file: /var/opt/agent/health/health-properties-main.gob
2020/06/13 16:25:42.640719 Adding a new plugin plugins/bdc.so 
2020/06/13 16:25:43.302872 Adding a new plugin plugins/platform.so 
2020/06/13 16:25:43.302932 Created a health module watcher for service 'fluentbit'
2020/06/13 16:25:43.302948 Starting a new watcher for health module: fluentbit 
2020/06/13 16:25:43.302983 Starting a new watcher for health module: agent 
2020/06/13 16:25:43.302992 Health monitoring started
2020/06/13 16:25:53.000908 [READINESS] All services marked as ready.
2020/06/13 16:25:53.000966 [READINESS] Container is now ready.
2020/06/13 16:26:01.995093 [MONITOR] Service states: map[fluentbit:RUNNING]

sql-server-2019

deployment

azure-aks

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-21 06:08:36

全,

最终，它得到了解决。

我们的azure策略和网络策略有几个问题。

(1) It was not allowing new IP addresses to be assigned to the loadbalancer. 
(2) The gateway proxy was not getting new IP Addresses since we ran out of our quota of 10  max IPs that were allowed. 
(3) My desktop from where I started to deploy was not able to ping the controller service IP addresses and Port.

我们一个接一个地解决了上面的问题，我们已经进入了最后阶段。

由于IP地址是静态的，但在运行时生成，因此不能提供它。其他人是如何与他们的网络/azure基础设施团队一起处理这一问题的？

谢谢，rgn

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62363358

复制

相似问题

问管理代理生产上的就绪探测失败
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问管理代理生产上的就绪探测失败EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问管理代理生产上的就绪探测失败
EN