首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >蜂巢作业耗费了太多时间

蜂巢作业耗费了太多时间
EN

Stack Overflow用户
提问于 2015-12-21 04:05:34
回答 2查看 1.1K关注 0票数 0

此阶段是表A (100k行)和B(500万行)在键上的连接。

表A只有两列,id作为匹配键。

我尝试了很多方法来将这个阶段转换为Map join而不是common join,但它仍然需要很长时间才能作为common join运行。有什么建议可以加快速度吗?

另外,为什么67% reduce总是发生得这么快,然后一步一步地花了很长时间呢?

代码语言:javascript
复制
2015-12-21 01:12:55,635 Stage-2 map = 0%,  reduce = 0%
2015-12-21 01:13:39,342 Stage-2 map = 20%,  reduce = 0%, Cumulative CPU 5.49 sec
2015-12-21 01:13:43,618 Stage-2 map = 40%,  reduce = 0%, Cumulative CPU 31.79 sec
2015-12-21 01:13:45,692 Stage-2 map = 60%,  reduce = 0%, Cumulative CPU 34.42 sec
2015-12-21 01:13:46,735 Stage-2 map = 73%,  reduce = 0%, Cumulative CPU 45.1 sec
2015-12-21 01:13:48,812 Stage-2 map = 80%,  reduce = 0%, Cumulative CPU 46.87 sec
2015-12-21 01:13:57,125 Stage-2 map = 93%,  reduce = 0%, Cumulative CPU 60.03 sec
2015-12-21 01:13:58,160 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 61.46 sec
2015-12-21 01:14:42,001 Stage-2 map = 100%,  reduce = 67%, Cumulative CPU 72.34 sec
2015-12-21 01:15:42,196 Stage-2 map = 100%,  reduce = 67%, Cumulative CPU 141.27 sec
2015-12-21 01:16:31,357 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 183.86 sec
2015-12-21 01:17:31,587 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 245.5 sec
2015-12-21 01:18:31,840 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 306.58 sec
2015-12-21 01:19:32,275 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 371.49 sec
2015-12-21 01:20:32,549 Stage-2 map = 100%,  reduce = 68%, Cumulative CPU 433.61 sec
2015-12-21 01:20:58,591 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 457.46 sec
2015-12-21 01:21:58,904 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 516.95 sec
2015-12-21 01:22:59,143 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 576.51 sec
2015-12-21 01:23:59,480 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 636.39 sec
2015-12-21 01:24:59,810 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 692.75 sec
2015-12-21 01:25:59,978 Stage-2 map = 100%,  reduce = 69%, Cumulative CPU 757.39 sec
EN

回答 2

Stack Overflow用户

发布于 2015-12-21 04:22:31

你的减速机进展缓慢,一步一步地,需要时间来完成。

一个map reduce任务本质上是three stagesMap taskShuffleReducer task

这些阶段中的每个阶段都为整个作业的完成贡献了33.33%完成。在这里,数据的前两个阶段Map taskShuffle已经完成。这就是为什么你看到的Reducer已经完成了67%。其余的完成取决于Reducer task的进度。Reducer side join正在耗费时间。

票数 1
EN

Stack Overflow用户

发布于 2015-12-21 09:23:48

您可以使用set mapreduce.job.reduces=<number_of_reducers>。如果没有加速,请粘贴完整的日志。您可以从as 4开始,看看它是否提高了性能。

还提供了有关集群配置的一些详细信息。单节点或多节点,如果是多节点,有多少个节点等。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34385274

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档