MR execution in YARN
Overview
YARN provides API not for application developers but for the great developers working on new computing engines. YARN make it easy and unified for resource management for the computing engines. It fills the gap between mputation and storage. NoSQL database like HBase use slider apdaters to YARN.
With YARN 
Withou YARN

Entities in YARN
The base of Distribution is HDFS and YARN. HDFS for managing storage. YARN for managing computing.
Client: who submits the job: connects to MR or HDFS framework.
YARN Resource Manager: allocate computing resource required by the job.
Scheduleer:job scheduling,locate the resources.
Application Manager:performan any monitoring or tracking of application/job status.
YARN Node Manager: on all slave nodes. launch / manager containers.
MR Application Master:carry out execution of the job associated with it. different between computing engines. It coordinates the tasks running and monitors the progress and aggregates it and since reports to its client . It is spawn under node manager on the instruction by RM. spawn for every job and end with the job done.
YARN Child: manages the run of the map and reduce tasks,send updates / progress to application master.
HDFS:i/o
The process of job run in YARN
Job submission:Your program triggers the job client and the job client contacts the RM for the new job id. copy the job resource to HDFS with high replica and then submit the job.
Job Initialization: Then RM (the scheduler)picks up the job from the job queue(FIFO,capacity,fair) and contacts NM,sponsor new container (Linux kernel feature, a abstruct of resource like cpu,mem,disk,network bandwidth. doker uses it too) and launches AM for the job.
Job Assigement: AM creates new objects , it retrives the input splits from HDFS and crete one task per input split. AM then decides if the job is samll or not. If it is small job , run its jvm on a single node. If not,contacts RM locate computing resources.
Job Execution:RM considers data locality while assigning the resources(Scheduler at this time knows where the splits are located.It gathers this info from the heartbeats of NM. Based on it, it consider data locality when allocating resources. try as best, then consider the rack local nodes ,if still fails, it will pick random from available noedes).AM then communicates node managers which launches the yarn child(a java program the main class is YarnChild,seperate JVM from long running system demons from the suer code). yarn child retrieves the code and other resource from HDFS and then run the tasks(mr).Yan child sends the progress to AM(every 3 seconds) which aggregrates(each yarn client's information) the report and sends the report to the client.
Job Exmpletion:On job completion , yarn child and AM terminates themseves for the next job.

MR execution in YARN的更多相关文章
- Yarn源码分析之MRAppMaster上MapReduce作业处理总流程(一)
我们知道,如果想要在Yarn上运行MapReduce作业,仅需实现一个ApplicationMaster组件即可,而MRAppMaster正是MapReduce在Yarn上ApplicationMas ...
- hadoop多机安装HA+YARN
HA 相比于Hadoop1.0,Hadoop 2.0中的HDFS增加了两个重大特性,HA(热备)和Federation(联邦).HA即为High Availability,用于解决NameNode单点 ...
- hadoop多机安装YARN
hadoop伪分布安装称为测试环境安装,多机分布称为生成环境安装.以下安装没有进行HA(热备)和Federation(联邦).除非是性能需要,否则没必要安装Federation,HA可以一试,涉及到Z ...
- Hadoop2.4.1 64-Bit QJM HA and YARN HA + Zookeeper-3.4.6 + Hbase-0.98.8-hadoop2-bin HA Install
Hadoop2.4.1 64-Bit QJM HA and YARN HA Install + Zookeeper-3.4.6 + Hbase-0.98.8-hadoop2-bin HA(Hadoop ...
- Hadoop 5、HDFS HA 和 YARN
Hadoop 2.0 产生的背景Hadoop 1.0 中HDFS和MapReduce存在高可用和扩展方面的问题 HDFS存在的问题 NameNode单点故障,难以用于在线场景 NameNode压力过大 ...
- YARN的基础配置
基于HADOOP3.0+Centos7.0的yarn基础配置: 执行步骤:(1)配置集群yarn (2)启动.测试集群(3)在yarn上执行wordcount案例 一.配置yarn集群 1.配置yar ...
- YARN的三种调度器的使用
YRAN提供了三种调度策略 一.FIFO-先进先出调度器 YRAN默认情况下使用的是该调度器,即所有的应用程序都是按照提交的顺序来执行的,这些应用程序都放在一个队列中,只有在前面的一个任务执行完成之后 ...
- Hadoop YARN上运行MapReduce程序
(1)配置集群 (a)配置hadoop-2.7.2/etc/hadoop/yarn-env.sh 配置一下JAVA_HOME export JAVA_HOME=/home/hadoop/bigdata ...
- hadoop3.1集成yarn ha
1.角色分配
随机推荐
- 一点一点看JDK源码(五)java.util.ArrayList 后篇之forEach
一点一点看JDK源码(五)java.util.ArrayList 后篇之forEach liuyuhang原创,未经允许禁止转载 本文举例使用的是JDK8的API 目录:一点一点看JDK源码(〇) 代 ...
- Swift_TableView(delegate,dataSource,prefetchDataSource 详解)
Swift_TableView(delegate,dataSource,prefetchDataSource 详解) GitHub import UIKit let identifier = &quo ...
- DB数据源之SpringBoot+MyBatis踏坑过程(二)手工配置数据源与加载Mapper.xml扫描
DB数据源之SpringBoot+MyBatis踏坑过程(二)手工配置数据源与加载Mapper.xml扫描 liuyuhang原创,未经允许进制转载 吐槽之后应该有所改了,该方式可以作为一种过渡方式 ...
- Python的多进程
这里不说其它,Python的多进程网上已经有很多了,可以尽情搜索.但是用多进程一般是采用对任务的方式,所以注意文件锁定.一般采用Pool是比较合适的.给个网友的小代码 from multiproces ...
- 【2015 ICPC亚洲区域赛长春站 G】Dancing Stars on Me(几何+暴力)
Problem Description The sky was brushed clean by the wind and the stars were cold in a black sky. Wh ...
- Mybatis根据数据库中的表自动生成Bean对象与Mapper文件 (小白式教程)
示例IDE采用 IDEA //**********************华丽的分割线****************// 1.新建一个java项目-->在Src目录下创建3个包(Package ...
- HTML+css 文字只显示一行
电脑端 设置行高,超出隐藏 p{ width: 80%; height: 16px; line-height: 16px; display: block; overflow: hidden; text ...
- 内网环境下为Elasticsearch 5.0.2 添加head服务
背景: 本项目的服务器是内网环境,没有网络,因此需要在离线的环境中,安装head服务. 需要用到的安装包有: node的安装包 elasticsearch的head插件源码 说明:此次只讲述为elas ...
- 《TCP/IP详解 卷1:协议》第3章 IP:网际协议
3.1 引言 IP是TCP/IP协议族中最为核心的协议.所有的TCP.UDP.ICMP及IGMP数据都以IP数据报格式传输(见图1-4).许多刚开始接触TCP/IP的人对IP提供不可靠.无连接的数据报 ...
- c语言入门这一篇就够了-学习笔记(一万字)
内容来自慕课网,个人学习笔记.加上了mtianyan标签标记知识点. C语言入门 -> Linux C语言编程基本原理与实践 -> Linux C语言指针与内存 -> Linux C ...