开发者社区

文档建议反馈控制台

文章/答案/技术大牛

发布

社区首页 >问答首页 >Spark executors、任务和分区

问Spark executors、任务和分区
EN

Stack Overflow用户

提问于 2019-11-10 02:44:28

回答 1查看 497关注 0票数 0

当我不断阅读关于Spark架构和调度的在线资源时，我开始变得更加困惑。其中一个资源就是：The number of tasks in a stage is the same as the number of partitions in the last RDD in the stage。另一方面：Spark maps the number tasks on a particular Executor to the number of cores allocated to it。因此，第一个资源说，如果我有1000个分区，那么无论我的机器是什么，我都将有1000个任务。在第二种情况下，如果我有4个核心机器和1000个分区，那么会发生什么呢？我将有4个任务？那么数据是如何处理的呢？

另一个混淆是：each worker can process one task at a time和Executors can run multiple tasks over its lifetime, both in parallel and sequentially。那么任务是顺序的还是并行的呢？

EN

回答 1

Stack Overflow用户

发布于 2019-11-10 03:39:27

任务数由RDD/DataFrame
The的分区数给出，执行器可并行处理的任务数由其核心数给出，除非spark.task.cpus配置为1(默认值)以外的值

因此，可以将任务看作是必须处理的一些(独立的)工作块。它们肯定可以并行运行。

因此，如果您有1000个分区和5个执行器，每个执行器有4个核心，那么20个任务通常会并行运行

票数 2

EN

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58782420

复制

相似问题