首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在spark中写入文件时出现问题

在spark中写入文件时出现问题
EN

Stack Overflow用户
提问于 2016-03-21 13:42:39
回答 1查看 1.2K关注 0票数 1

我使用以下选项在本地模式下使用spark

代码语言:javascript
复制
spark-shell --driver-memory 21G --executor-memory 10G --num-executors 4 --driver-java-options "-Dspark.executor.memory=10G"  --executor-cores 8

它是一个四节点群集,每个节点有32G RAM。

我使用DIMSUM计算列相似度,并尝试写入文件

它计算了670万个项目的列相似度,当持久化到文件时,它会导致线程溢出问题。

代码语言:javascript
复制
dimSumOutput.coalesce(1, true).saveAsTextFile("/user/similarity")

dimSumOutput是一个包含(row,col,sim)格式的列相似度的RDD

代码语言:javascript
复制
16/03/20 21:41:22 INFO spark.ContextCleaner: Cleaned shuffle 2
16/03/20 21:41:25 INFO collection.ExternalSorter: Thread 184 spilling in-    memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:26 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:26 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (1 time so far)
16/03/20 21:41:28 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (1 time so far)
16/03/20 21:41:31 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 535.0 MB to disk (1 time so far)
16/03/20 21:41:32 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 609.3 MB to disk (1 time so far)
16/03/20 21:42:07 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 481.3 MB to disk (2 times so far)
16/03/20 21:42:14 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (2 times so far)
16/03/20 21:42:18 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (2 times so far)
16/03/20 21:42:21 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 491.5 MB to disk (2 times so far)
16/03/20 21:42:27 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 542.7 MB to disk (2 times so far)
16/03/20 21:42:32 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 583.7 MB to disk (2 times so far)
16/03/20 21:43:25 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (3 times so far)
16/03/20 21:43:33 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (3 times so far)
16/03/20 21:43:45 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 483.8 MB to disk (3 times so far)
16/03/20 21:43:50 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (3 times so far)
16/03/20 21:43:56 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 535.0 MB to disk (3 times so far)
16/03/20 21:44:01 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 624.6 MB to disk (3 times so far)
16/03/20 21:44:14 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 482.6 MB to disk (4 times so far)
16/03/20 21:44:20 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (4 times so far)
16/03/20 21:44:37 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (4 times so far)
16/03/20 21:45:09 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (4 times so far)
16/03/20 21:45:22 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 581.1 MB to disk (4 times so far)
16/03/20 21:45:23 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (4 times so far)
16/03/20 21:45:28 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (5 times so far)
16/03/20 21:45:40 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 486.4 MB to disk (5 times so far)
16/03/20 21:45:52 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (5 times so far)
16/03/20 21:45:59 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (5 times so far)
16/03/20 21:46:14 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (6 times so far)
16/03/20 21:46:24 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.6 MB to disk (5 times so far)
16/03/20 21:46:25 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 527.4 MB to disk (5 times so far)
16/03/20 21:47:11 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 576.0 MB to disk (6 times so far)
16/03/20 21:47:19 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 491.5 MB to disk (6 times so far)
16/03/20 21:47:20 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (6 times so far)
16/03/20 21:47:43 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 686.1 MB to disk (7 times so far)
16/03/20 21:47:50 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (6 times so far)
16/03/20 21:47:57 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 599.0 MB to disk (6 times so far)
16/03/20 21:48:04 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 481.3 MB to disk (7 times so far)
16/03/20 21:48:39 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (7 times so far)
16/03/20 21:48:40 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (7 times so far)
16/03/20 21:49:06 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (8 times so far)
16/03/20 21:49:21 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.5 MB to disk (7 times so far)
16/03/20 21:49:21 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 489.0 MB to disk (8 times so far)
16/03/20 21:49:28 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 540.2 MB to disk (7 times so far)
16/03/20 21:49:36 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 485.1 MB to disk (8 times so far)
16/03/20 21:49:39 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 601.6 MB to disk (8 times so far)
16/03/20 21:50:04 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 576.0 MB to disk (9 times so far)
16/03/20 21:50:20 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.7 MB to disk (8 times so far)
16/03/20 21:50:24 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (9 times so far)
16/03/20 21:50:27 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 539.5 MB to disk (8 times so far)
16/03/20 21:50:28 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 478.4 MB to disk (9 times so far)
16/03/20 21:51:03 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 489.0 MB to disk (9 times so far)
16/03/20 21:51:22 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:51:41 INFO collection.ExternalSorter: Thread 186 spilling in-memory map of 519.5 MB to disk (9 times so far)
16/03/20 21:51:45 INFO collection.ExternalSorter: Thread 188 spilling in-memory map of 483.8 MB to disk (10 times so far)
16/03/20 21:51:45 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:51:51 INFO collection.ExternalSorter: Thread 187 spilling in-memory map of 550.4 MB to disk (9 times so far)
16/03/20 21:52:04 INFO collection.ExternalSorter: Thread 189 spilling in-memory map of 479.5 MB to disk (10 times so far)
16/03/20 21:52:20 INFO collection.ExternalSorter: Thread 184 spilling in-memory map of 509.4 MB to disk (11 times so far)
16/03/20 21:52:40 INFO collection.ExternalSorter: Thread 185 spilling in-memory map of 479.5 MB to disk (11 times so far)

有什么关于如何修复它的建议吗?

EN

回答 1

Stack Overflow用户

发布于 2016-03-21 15:43:31

1)奇怪的是,你使用的是--executor-memory 65G (比你的32 It还大!)然后在相同的命令行--driver-java-options "-Dspark.executor.memory=10G"上。是打字错误吗?如果没有,你确定这种调用的效果吗?请提供更多信息。

2)更重要的是,在您的4个工作进程处理完数据之后,您要求Spark将数据合并到单个分区(因此在单个执行器上)。根据executor分配的内存(参见1),这可能意味着单个executor要处理太多太大的记录。在这里,我首先要确保分配给executors的内存量是多少(例如,如果您使用Spark UIYarn UI,请参阅它们)。然后我真的会考虑是否需要将coalesce设置为1。此外,正如@Yaron建议的那样,您可以查看应用程序的shuffle相关设置,并更改spark.shuffle.memoryFraction (在与spark.storage.memoryFraction求和时请记住0.8的最大值),但请记住,更新版本的Spark会认为此类设置已被弃用。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/36123927

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档