Overview

YARN provides API not for application developers but for the great developers working on new computing engines. YARN make it easy and unified for resource management for the computing engines. It fills the gap between mputation and storage. NoSQL database like HBase use slider apdaters to YARN.

With YARN 

Withou YARN

Entities in YARN

The base of Distribution is HDFS and YARN. HDFS for managing storage. YARN for managing computing.
Client: who submits the job: connects to MR or HDFS framework.
YARN Resource Manager: allocate computing resource required by the job.
Scheduleer:job scheduling,locate the resources.
Application Manager:performan any monitoring or tracking of application/job status.
YARN Node Manager: on all slave nodes. launch / manager containers.
MR Application Master:carry out execution of the job associated with it. different between computing engines. It coordinates the tasks running and monitors the progress and aggregates it and since reports to its client . It is spawn under node manager on the instruction by RM. spawn for every job and end with the job done.
YARN Child: manages the run of the map and reduce tasks,send updates / progress to application master.
HDFS:i/o

The process of job run in YARN

Job submission:Your program triggers the job client and the job client contacts the RM for the new job id. copy the job resource to HDFS with high replica and then submit the job.

Job Initialization: Then RM (the scheduler)picks up the job from the job queue(FIFO,capacity,fair) and contacts NM,sponsor new container (Linux kernel feature, a abstruct of resource like cpu,mem,disk,network bandwidth. doker uses it too) and launches AM for the job.
Job Assigement: AM creates new objects , it retrives the input splits from HDFS and crete one task per input split. AM then decides if the job is samll or not. If it is small job , run its jvm on a single node. If not,contacts RM locate computing resources.
Job Execution:RM considers data locality while assigning the resources(Scheduler at this time knows where the splits are located.It gathers this info from the heartbeats of NM. Based on it, it consider data locality when allocating resources. try as best, then consider the rack local nodes ,if still fails, it will pick random from available noedes).AM then communicates node managers which launches the yarn child(a java program the main class is YarnChild,seperate JVM from long running system demons from the suer code). yarn child retrieves the code and other resource from HDFS and then run the tasks(mr).Yan child sends the progress to AM(every 3 seconds) which aggregrates(each yarn client's information) the report and sends the report to the client.
Job Exmpletion:On job completion , yarn child and AM terminates themseves for the next job.

 

MR execution in YARN的更多相关文章

  1. Yarn源码分析之MRAppMaster上MapReduce作业处理总流程(一)

    我们知道,如果想要在Yarn上运行MapReduce作业,仅需实现一个ApplicationMaster组件即可,而MRAppMaster正是MapReduce在Yarn上ApplicationMas ...

  2. hadoop多机安装HA+YARN

    HA 相比于Hadoop1.0,Hadoop 2.0中的HDFS增加了两个重大特性,HA(热备)和Federation(联邦).HA即为High Availability,用于解决NameNode单点 ...

  3. hadoop多机安装YARN

    hadoop伪分布安装称为测试环境安装,多机分布称为生成环境安装.以下安装没有进行HA(热备)和Federation(联邦).除非是性能需要,否则没必要安装Federation,HA可以一试,涉及到Z ...

  4. Hadoop2.4.1 64-Bit QJM HA and YARN HA + Zookeeper-3.4.6 + Hbase-0.98.8-hadoop2-bin HA Install

    Hadoop2.4.1 64-Bit QJM HA and YARN HA Install + Zookeeper-3.4.6 + Hbase-0.98.8-hadoop2-bin HA(Hadoop ...

  5. Hadoop 5、HDFS HA 和 YARN

    Hadoop 2.0 产生的背景Hadoop 1.0 中HDFS和MapReduce存在高可用和扩展方面的问题 HDFS存在的问题 NameNode单点故障,难以用于在线场景 NameNode压力过大 ...

  6. YARN的基础配置

    基于HADOOP3.0+Centos7.0的yarn基础配置: 执行步骤:(1)配置集群yarn (2)启动.测试集群(3)在yarn上执行wordcount案例 一.配置yarn集群 1.配置yar ...

  7. YARN的三种调度器的使用

    YRAN提供了三种调度策略 一.FIFO-先进先出调度器 YRAN默认情况下使用的是该调度器,即所有的应用程序都是按照提交的顺序来执行的,这些应用程序都放在一个队列中,只有在前面的一个任务执行完成之后 ...

  8. Hadoop YARN上运行MapReduce程序

    (1)配置集群 (a)配置hadoop-2.7.2/etc/hadoop/yarn-env.sh 配置一下JAVA_HOME export JAVA_HOME=/home/hadoop/bigdata ...

  9. hadoop3.1集成yarn ha

    1.角色分配

随机推荐

  1. Lodash数组篇

    概念简述 lodash 是一个类库 Lodash 通过降低 array.number.objects.string 等等的使用难度从而让 JavaScript 变得更简单 用法  let _ = re ...

  2. Unity 游戏框架搭建 (十七) 静态扩展GameObject实现链式编程

    本篇本来是作为原来 优雅的QChain的第一篇的内容,但是QChain流产了,所以收录到了游戏框架搭建系列.本篇介绍如何实现GameObject的链式编程. 链式编程的实现技术之一是C#的静态扩展.静 ...

  3. oracle中的事务

     事务 概述:通过sql 对数据库进行操作时,同时执行成功或失败,且数据完整性一致. 链接到oracle的用户(例如plsql或sqlplus)会形成一个session, 此时对数据库的更新操作,不会 ...

  4. ie浏览器下载文件时文件名乱码

    做一个文件下载功能时,用ie浏览器下载时文件名乱码,火狐和谷歌正常,修改后ie显示正常,修改方法如下: @RequestMapping(value = "fileDownload" ...

  5. Oracle行转列,pivot函数和unpivot函数

    pivot函数:行转列函数: 语法:pivot(任一聚合函数 for 需专列的值所在列名 in (需转为列名的值)):unpivot函数:列转行函数: 语法:unpivot(新增值所在列的列名 for ...

  6. OC - KVO实现原理

    1.KVO简介 KVO是Objective-C对观察者设计模式的一种实现,它提供一种机制,指定一个被观察对象(如A类),当对象中的某个属性发生变化的时候,对象就会接收到通知,并作出相应的处理.在MVC ...

  7. 关于chrome浏览器不能更新js的问题

    今天写程序时,突然发现无论我怎么改本地js,用chrome打开时,均是改动之前的效果,F12查看Sources时发现js文件并没有被改动.由此引发的问题,经查询解决方法如下: F12后按F1,出现Se ...

  8. .net core 基于Claim登录验证

    网站,首先需要安全,实现安全就必须使用登录验证,.net core 基于Claim登录验证就很简单使用. Claim是什么,可以理解为你的身份证的中的名字,性别等等的每一条信息,然后Claim组成一个 ...

  9. Linux 学习第五天

    一.重定向.管道符.通配符 1.重定向.管道符使用 重定向: 命令文件 管道符: 命令A:命令B  (管道符  |  别称 “任意门”) 二.常用命令 1.ls /etc | wc -l  (查看目录 ...

  10. Python模块、包、异常、文件(案例)

    Python模块.包.异常.文件(案例) python.py #模块 # Python中的模块(Module),是一个Python文件,以.py文件结尾,包含了Python对象定义和Python语句, ...