Hadoop 2中有一个名为uberization的新特性。例如,本参考说:
如果任务足够小,可以在ApplicationMaster的JVM中运行MapReduce作业的所有任务。这样,您就避免了从ResourceManager请求容器和要求NodeManagers启动(据说很小)任务的开销。
我不知道这是在幕后魔术般的发生,还是需要做些什么才能让这一切发生呢?例如,在执行Hive查询时,是否有设置(或提示)来实现此操作?你能指定什么是“足够小”的阈值吗?
而且,我很难找到关于这个概念的很多东西--它是不是换了一个名字?
发布于 2014-06-19 21:45:31
我在阿伦·穆尔西( Arun )的纱线簿中找到了有关"uber就业“的详细信息:
当多个映射器和减速器组合使用单个容器时,就会发生Uber作业。优步乔布斯的配置有四个核心设置,见表9.3中的mapred-site.xml选项。
表9.3如下:
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.enable | Whether to enable the small-jobs "ubertask" optimization, |
| | which runs "sufficiently small" jobs sequentially within a |
| | single JVM. "Small" is defined by the maxmaps, maxreduces, |
| | and maxbytes settings. Users may override this value. |
| | Default = false. |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxmaps | Threshold for the number of maps beyond which the job is |
| | considered too big for the ubertasking optimization. |
| | Users may override this value, but only downward. |
| | Default = 9. |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxreduces | Threshold for the number of reduces beyond which |
| | the job is considered too big for the ubertasking |
| | optimization. Currently the code cannot support more |
| | than one reduce and will ignore larger values. (Zero is |
| | a valid maximum, however.) Users may override this |
| | value, but only downward. |
| | Default = 1. |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxbytes | Threshold for the number of input bytes beyond |
| | which the job is considered too big for the uber- |
| | tasking optimization. If no value is specified, |
| | `dfs.block.size` is used as a default. Be sure to |
| | specify a default value in `mapred-site.xml` if the |
| | underlying file system is not HDFS. Users may override |
| | this value, but only downward. |
| | Default = HDFS block size. |
|-----------------------------------+------------------------------------------------------------|我还不知道是否有一种特定于蜂巢的方式来设置这个,或者你是否只在蜂巢中使用上面的方法。
发布于 2015-04-23 14:35:37
当多个映射器和减速器组合在应用程序母版中执行时,就会发生Uber作业。因此,假设要执行的作业有MAX Mappers <= 9;MAX减缩器<= 1,那么资源管理器(RM)创建一个应用程序母版,并使用自己的JVM在应用程序母版中很好地执行作业。
mapreduce.job.ubertask.enable=TRUE; 集
因此,使用Uberised作业的优点是,通过向容器请求任务,应用程序主执行的往返开销从资源管理器( Resource,RM )和将容器分配给应用程序主( Application )的RM中消除。
https://stackoverflow.com/questions/24092219
复制相似问题