YARN(Yet Another Resource Negotiator)是一个通用的资源管理平台,可为各类计算框架提供资源的管理和调度。其核心出发点是为了分离资源管理与作业调度/监控,实现分离的做法是拥有一个全局资源管理器(ResourceManager,RM),以及每个应用程序对应一个应用管理器(ApplicationMaster,AM),应用程序由一个作业(Job)或者Job的有向无环图(DAG)组成。
一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议。RPC协议假定某些传输协议的存在,如TCP或UDP,为通信程序之间携带信息数据。在OSI网络通信模型中,RPC跨越了传输层和应用层。RPC使得开发包括网络分布式程序在内的应用程序更加轻易。 (Hadoop 2.6版本)
本文分析Hadoop2.6源码,生命周期长的对象,Yarn采用基于服务的对象管理模型对其进行管理。
supergroup 0 2016-12-10 18:02 /tmp/hadoop-yarn/staging/history drwxrwx--- - john supergroup 0 2016-12-11 12:12 /tmp/hadoop-yarn/staging/john drwx------ - john supergroup 0 2016-12-11 13:08 /tmp/hadoop-yarn/staging/john/.staging drwx------ - john supergroup 0 2016-12-11 12 :12 /tmp/hadoop-yarn/staging/john/.staging/job_1481426363207_0001 -rw-r--r-- 10 john supergroup 270368 2016-12-11 12:12 /tmp/hadoop-yarn/staging/john/.staging/job_1481426363207_0001/job.jar -rw-r--
1、登录bigdata29节点,查看/var/lib/hadoop-yarn/yarn-nm-recovery/目录,发现该目录为空 2、查看相关目录权限,发现hadoop-yarn目录权限为000,定位到问题 4096 Jul 16 22:39 hadoop-yarn ? 3、更改hadoop-yarn目录权限为755 ? 4、重启bigdata29的NodeManager角色实例,能够正常启动 ? 3.2 相关建议 在添加NodeManager之前,在相关节点上手动创建/var/lib/hadoop-yarn目录,可以避免出现该问题。如果节点数量太多,可通过批量执行命令脚本创建目录。 要避免该问题可以提前创建/var/lib/hadoop-yarn/
2 问题解决 1.备份该NodeManager节点上的 /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state目录: [root@cdh03 hadoop-yarn ]# tar cvzf nmstate.tar.gz /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/* ? 2.删除该NodeManager节点上的 /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state目录: [root@cdh03 hadoop-yarn]# rm -rf /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state 3.再次重启该NodeManager服务 ? 在CDH中默认:/var/lib/hadoop-yarn/yarn-nm-recovery 2.对于本文提到的异常,即NodeManager用于保存container状态的文件损坏或者丢失,根本原因还有待确认
YARN 结构 文档简介: Yarn的基本思想是拆分资源管理的功能,作业调度/监控到单独的守护进程 英文网址: http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn 英文网址: http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html 3.Hadoop 英文网址: http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/FairScheduler.html ######### 英文网址 http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html 结构: 英文网址 http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn 第四步:检查核实创建文件 drwxrwxrwt - hdfs supergroup 0 2012-05-31 15:31 /tmp drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /tmp/hadoop-yarn drwxrwxrwt - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging drwxr-xr-x - mapred mapred 0 2012-05-31 15 :31 /tmp/hadoop-yarn/staging/history drwxrwxrwt - mapred mapred 0 2012-05-31 15:31 /tmp/hadoop-yarn/staging
hdfs上会在对应目录行程对应的文件类似fileList.seq.chunk.0000x: drwx------ - hadoop supergroup 0 2018-05-13 17:50 /emr/hadoop-yarn /staging/hadoop/.staging/_distcp1061656248 drwx------ - hadoop supergroup 0 2018-05-13 17:50 /emr/hadoop-yarn hadoop/.staging/_distcp1061656248/chunkDir -rw-r--r-- 1 hadoop supergroup 1504 2018-05-13 17:50 /emr/hadoop-yarn chunkDir/task_1526024399954_0017_m_000000 -rw-r--r-- 1 hadoop supergroup 1524 2018-05-13 17:50 /emr/hadoop-yarn chunkDir/task_1526024399954_0017_m_000001 -rw-r--r-- 1 hadoop supergroup 6686 2018-05-13 17:50 /emr/hadoop-yarn
报错信息为: Caused by: java.lang.IllegalStateException: Could not open file nio:/var/lib/hadoop-yarn/config-service.mv.db MVTableEngine.java:162) ... 25 common frames omitted Caused by: java.io.FileNotFoundException: /var/lib/hadoop-yarn 2.解决办法 1.手动去对应目录下创建一个名为config-service.mv.db的文件 [root@cdh2 hadoop-yarn]# vim config-service.mv.db ? [root@cdh2 hadoop-yarn]# chown yarn:yarn config-service.mv.db ? 2.配置完之后,从CM启动服务测试 ? ? 重启成功,异常解决。
localResources { key: "job.jar" value { resource { scheme: "hdfs" host: "192.168.92.150" port: 8020 file: "/tmp/hadoop-yarn /job.splitmetainfo" value { resource { scheme: "hdfs" host: "192.168.92.150" port: 8020 file: "/tmp/hadoop-yarn jobSubmitDir/job.split" value { resource { scheme: "hdfs" host: "192.168.92.150" port: 8020 file: "/tmp/hadoop-yarn localResources { key: "job.xml" value { resource { scheme: "hdfs" host: "192.168.92.150" port: 8020 file: "/tmp/hadoop-yarn
HADOOP_CLASSPATH=/usr/hdp/3.1.5.0-152/hadoop-hdfs/*:/usr/hdp/3.1.5.0-152/hadoop-hdfs/lib/*:/usr/hdp/3.1.5.0-152/hadoop-yarn /*:/usr/hdp/3.1.5.0-152/hadoop-yarn/lib/*:/usr/hdp/3.1.5.0-152/hadoop/*:/usr/hdp/3.1.5.0-152/hadoop-mapreduce FLINK_HADOOP_CLASSPATH=/usr/hdp/3.1.5.0-152/hadoop-hdfs/*:/usr/hdp/3.1.5.0-152/hadoop-hdfs/lib/*:/usr/hdp/3.1.5.0-152/hadoop-yarn /*:/usr/hdp/3.1.5.0-152/hadoop-yarn/lib/*:/usr/hdp/3.1.5.0-152/hadoop/*:/usr/hdp/3.1.5.0-152/hadoop-mapreduce
NodeManager启动失败 ---- 【问题描述】 在使用CDH5.11.2版本时,新增YARN的NodeManager角色,在启动角色实例时,出现如下异常: 异常信息为:”IO error: /var/lib/hadoop-yarn /LOCK: Permission denied” 在Redhat7.2操作系统上部署5.11.2版本的CDH集群以及为该版本集群扩容节点都遇到过该类问题 【问题原因】 故障节点的/var/lib/hadoop-yarn /目录权限为000 【解决办法】 更改hadoop-yarn目录权限为755,重启NodeManager角色实例。 【建议】 在添加NodeManager之前,在相关节点上手动创建/var/lib/hadoop-yarn目录,可以避免出现该问题。如果节点数量太多,可通过批量执行命令脚本创建目录。
/tmp/hadoop-yarn/staging/root/.staging/job_1614266885310_0001/job.jar: Under replicated BP-184102405 /tmp/hadoop-yarn/staging/root/.staging/job_1614266885310_0001/job.split: Under replicated BP-184102405 /tmp/hadoop-yarn/staging/root/.staging/job_1614529282649_0001/job.jar: Under replicated BP-184102405 /tmp/hadoop-yarn/staging/root/.staging/job_1614648088274_0001/job.jar: Under replicated BP-184102405 Waiting for /tmp/hadoop-yarn/staging/history/done/2021/03/02/000000/job_1614665577456_0002_conf.xml .
远程过程中出现的一些错误 Cannot delete /tmp/hadoop-yarn/staging/hadoop/.staging/job_1477796535608_0001. mapred-site.xml中添加如下配置: <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/tmp/hadoop-yarn
如果是调度器的话, 将hadoop-2.3.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity”即可 step3.覆盖 将/hadoop-2.3.0-src/hadoop-yarn-project/hadoop-yarn
/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/lib/hadoop/libexec/../.. /hadoop-yarn/.
HADOOP_HDFS_HOME /opt/cloudera/parcels/CDH/lib/hadoop-hdfs setvar YARN_HOME /opt/cloudera/parcels/CDH/lib/hadoop-yarn export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop export YARN_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-yarn
hive rm -rf /var/log/zookeeper rm -rf /var/log/hive-hcatalog rm -rf /var/log/webhcat rm -rf /var/log/hadoop-yarn usr/lib/flume rm -rf /usr/lib/storm rm -rf /var/lib/hive rm -rf /var/lib/hadoop-hdfs rm -rf /var/lib/hadoop-yarn
参考资料: http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html