我正在运行一个EKS集群,并通过bitnami舵图在一个3节点集群(运行在不同AZs上的节点)上安装argo-工作流。
当使用来自values.yml的所有缺省值运行时,一切都运行得非常好。但是,当我将server.replicaCount从values.yml中的1增加到3以增加argo服务器副本时,另外两个副本就会随着CrashLoopBackOff状态而崩溃。
对崩溃的豆荚运行kubectl logs,我可以看到它们无法使用postgres容器进行身份验证。下面是日志的输出:
time="2022-01-05T03:50:16.921Z" level=info msg="not enabling pprof debug endpoints"
time="2022-01-05T03:50:16.924Z" level=info authModes="[client]" baseHRef=/ managedNamespace= namespace=argo secure=false
time="2022-01-05T03:50:16.924Z" level=warning msg="You are running in insecure mode. Learn how to enable transport layer security: https://argoproj.github.io/argo-workflows/tls/"
time="2022-01-05T03:50:16.924Z" level=info msg="config map" name=argo-workflows-controller
time="2022-01-05T03:50:16.924Z" level=info msg="SSO disabled"
time="2022-01-05T03:50:16.956Z" level=info msg="Starting Argo Server" instanceID= version=v3.2.6
time="2022-01-05T03:50:16.956Z" level=info msg="Creating DB session"
time="2022-01-05T03:50:16.980Z" level=fatal msg="pq: password authentication failed for user \"postgres\""知道为什么副本不能通过postgres容器进行身份验证吗?非复制容器没有这个问题,并且连接得很好。我还尝试通过在postgresql.postgresqlPassword中重写values.yml值来手动设置密码,但结果是相同的。对于k8s来说非常新,所以不太确定如何更好地解决这个问题。
我还意识到,当我输入这个时,副本控制器正在经历相同的行为。下面是失败控制器的日志:
time="2022-01-05T21:59:00Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2022-01-05T21:59:00Z" level=info msg="cron config" cronSyncPeriod=10s
time="2022-01-05T21:59:00.994Z" level=info msg="not enabling pprof debug endpoints"
time="2022-01-05T21:59:00.996Z" level=info msg="config map" name=argo-workflows-controller
time="2022-01-05T21:59:01.027Z" level=info msg="Get configmaps 200"
time="2022-01-05T21:59:01.038Z" level=info msg="Configuration:\nartifactRepository: {}\ncontainerRuntimeExecutor: k8sapi\nexecutor:\n name: \"\"\n resources: {}\ninitialDelay: 0s\nmetricsConfig: {}\nnodeEvents: {}\npersistence:\n connectionPool:\n maxIdleConns: 100\n postgresql:\n database: bn_argo_workflows\n host: argo-workflows-postgresql\n passwordSecret:\n key: postgresql-password\n name: argo-workflows-postgresql\n port: 5432\n tableName: argo_workflows\n userNameSecret:\n key: username\n name: argo-workflows-controller-database\npodSpecLogStrategy: {}\ntelemetryConfig: {}\n"
time="2022-01-05T21:59:01.038Z" level=info msg="Persistence configuration enabled"
time="2022-01-05T21:59:01.038Z" level=info msg="Creating DB session"
time="2022-01-05T21:59:01.044Z" level=info msg="Get secrets 200"
time="2022-01-05T21:59:01.049Z" level=info msg="Get secrets 200"
time="2022-01-05T21:59:01.054Z" level=fatal msg="Failed to update config: pq: password authentication failed for user \"postgres\""发布于 2022-01-28 09:31:19
我已经复制了GKE的问题,我看不到任何错误的用户\"postgres\"。所采取的步骤:
$ helm upgrade my-release bitnami/argo-workflows -f values.yaml --set server.replicaCount=3我可以看到,所有三个副本都在运行,没有任何错误:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-release-argo-workflows-controller-f5d748797-pkv99 1/1 Running 0 23m
my-release-argo-workflows-server-669c54dddb-l5c9b 1/1 Running 0 23m
my-release-argo-workflows-server-669c54dddb-nqvn2 1/1 Running 0 2m16s
my-release-argo-workflows-server-669c54dddb-w6ftl 1/1 Running 0 2m16s
my-release-postgresql-0 1/1 Running 0 31m如果您仍然面临这一问题,请在GitHub中打开新问题请求下的碧桂树图表或更新您的情况与更多的细节在这里复制。
https://stackoverflow.com/questions/70600112
复制相似问题