首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >错误H2O群集的大小应为%3,但大小为%2

错误H2O群集的大小应为%3,但大小为%2
EN

Stack Overflow用户
提问于 2020-11-26 03:28:02
回答 1查看 45关注 0票数 2

我正在尝试使用documentation中的步骤在Kubernetes上运行H2O SW。

我启动了一个测试软件应用

代码语言:javascript
复制
$ bin/spark-submit \
--master k8s://$KUBERNETES_ENDPOINT \
--deploy-mode cluster \
--class ai.h2o.sparkling.InitTest \
--conf spark.scheduler.minRegisteredResourcesRatio=1 \
--conf spark.kubernetes.container.image=h2oai/sparkling-water-scala:3.32.0.2-1-2.4 \
--conf spark.executor.instances=3 \
local:///opt/sparkling-water/tests/initTest.jar

似乎UI流正在正常运行,因为我可以在执行以下操作后访问它

代码语言:javascript
复制
$ kubectl port-forward ai-h2o-sparkling-inittest-1606331533023-driver 54322:54322

查看创建的SparklingWater pod的日志时,我看到以下内容

代码语言:javascript
复制
$ kubectl logs ai-h2o-sparkling-inittest-1606331533023-driver

20/11/25 19:14:14 INFO SignalUtils: Registered signal handler for INT
20/11/25 19:14:22 INFO Server: jetty-9.4.z-SNAPSHOT; built: 2018-06-05T18:24:03.829Z; git: d5fc0523cfa96bfebfbda19606cad384d772f04c; jvm 1.8.0_275-b01
20/11/25 19:14:23 INFO ContextHandler: Started a.h.o.e.j.s.ServletContextHandler@5af7a7{/,null,AVAILABLE}
20/11/25 19:14:23 INFO AbstractConnector: Started ServerConnector@63f4e498{HTTP/1.1,[http/1.1]}{0.0.0.0:54321}
20/11/25 19:14:23 INFO Server: Started @90939ms
20/11/25 19:14:23 INFO RestApiUtils: H2O node http://10.244.1.4:54321/3/Cloud successfully responded for the GET.
20/11/25 19:14:23 INFO H2OContext: Sparkling Water 3.32.0.2-1-2.4 started, status of context: 
Sparkling Water Context:
 * Sparkling Water Version: 3.32.0.2-1-2.4
 * H2O name: root
 * cluster size: 2
 * list of used nodes:
  (executorId, host, port)
  ------------------------
  (0,10.244.1.4,54321)
  (1,10.244.0.10,54321)
  ------------------------

  Open H2O Flow in browser: http://ai-h2o-sparkling-inittest-1606331533023-driver-svc.default.svc:54321 (CMD + click in Mac OSX)

     
Exception in thread "main" java.lang.RuntimeException: H2O cluster should be of size 3 but is 2
    at ai.h2o.sparkling.InitTest$.main(InitTest.scala:34)
    at ai.h2o.sparkling.InitTest.main(InitTest.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

查看SW创建的Pod时,我看到一个Pod处于挂起状态(从不进入running状态)

代码语言:javascript
复制
$ kubectl get pods                                                               
NAME                                             READY   STATUS    RESTARTS   AGE
ai-h2o-sparkling-inittest-1606331533023-driver   1/1     Running   0          13m
app-name-1606331575519-exec-1                    1/1     Running   0          12m
app-name-1606331575797-exec-2                    1/1     Running   0          12m
app-name-1606331575816-exec-3                    0/1     Pending   0          12m

有什么办法解决这个问题吗?

EN

回答 1

Stack Overflow用户

发布于 2020-11-26 03:45:57

这似乎是由于k8s集群没有足够的CPU(它是一个小集群)造成的。

在启动SW时减少执行器数量(从3个减少到2个)修复此问题

代码语言:javascript
复制
bin/spark-submit \      
--master k8s://$KUBERNETES_ENDPOINT \
--deploy-mode cluster \
--class ai.h2o.sparkling.InitTest \
--conf spark.scheduler.minRegisteredResourcesRatio=1 \
--conf spark.kubernetes.container.image=h2oai/sparkling-water-scala:3.32.0.2-1-2.4 \
--conf spark.executor.instances=2 \
local:///opt/sparkling-water/tests/initTest.jar

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65011496

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档