首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Spark从大型Cassandra表读取错误,使“远程RPC客户端断开关联”

使用Spark从大型Cassandra表读取错误,使“远程RPC客户端断开关联”
EN

Stack Overflow用户
提问于 2022-09-05 14:21:08
回答 1查看 62关注 0票数 0

我设置了独立星火集群(使用cassandra),但是当我读取数据时,error.My集群有3个节点,每个节点有64 GB内存和20个核心。我正在分享一些Spark-env.sh配置,比如spark_executor_cores: 5、spark_executor_memory:5G、spark_worker_cores:20和spark_worker_memory:45g。

我想给另一个信息,当我读小表没有问题,但当我读大表时,我会出错。错误描述在下面。同样,当我启动pyspark时,我使用以下命令:

代码语言:javascript
复制
$ ./pyspark --master spark://10.0.0.100:7077
    --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0
    --conf spark.driver.extraJavaOptions=-Xss1024m
    --conf spark.driver.port:36605
    --conf spark.driver.blockManager.port=42365

谢谢你的关心

代码语言:javascript
复制
ERROR TaskSchedulerImpl: Lost executor 5 on 10.0.0.10: Remote RPC client disassociated. likely due to containers exceeding threshold, or network issues. Chec driver logs for WARN messages
WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (10.0.0.10 executor 5): ExecutorLostFailure (executor 5 exited caused by one of the runnning task) reason: remote RPC client disassociated.
WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (10.0.0.11 executor 2):Java.lang.StackOverflowError
 at java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1094)
 at java.base/java.nio.HeapByteBuffer.get(HeapByteBuffer.java:184)
 at org.apache.spark.util.ByteBufferInputStream.read(ObjectInputStream.scala:49)
 at java.base/java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2887)
 at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2903)
 at java.base/java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3678) 
 at java.base/java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3678)
at java.base/java.io.ObjectInputStream.readString(ObjectInputStream.java:2058)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1663)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
EN

回答 1

Stack Overflow用户

发布于 2022-09-05 23:13:22

你遇到的问题很可能是网络问题。

非常不寻常的是,您需要将驱动程序端口与:

代码语言:javascript
复制
    --conf spark.driver.port:36605
    --conf spark.driver.blockManager.port=42365

你需要提供你为什么要这么做的背景信息。

同样,正如我上周在另一个问题上建议您的,您需要提供最小的代码+最小配置来复制问题。否则,没有足够的信息让其他人能够帮助你。干杯!

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73610843

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档