MR execution in YARN
Overview
YARN provides API not for application developers but for the great developers working on new computing engines. YARN make it easy and unified for resource management for the computing engines. It fills the gap between mputation and storage. NoSQL database like HBase use slider apdaters to YARN.
With YARN
Withou YARN
Entities in YARN
The base of Distribution is HDFS and YARN. HDFS for managing storage. YARN for managing computing.
Client: who submits the job: connects to MR or HDFS framework.
YARN Resource Manager: allocate computing resource required by the job.
Scheduleer:job scheduling,locate the resources.
Application Manager:performan any monitoring or tracking of application/job status.
YARN Node Manager: on all slave nodes. launch / manager containers.
MR Application Master:carry out execution of the job associated with it. different between computing engines. It coordinates the tasks running and monitors the progress and aggregates it and since reports to its client . It is spawn under node manager on the instruction by RM. spawn for every job and end with the job done.
YARN Child: manages the run of the map and reduce tasks,send updates / progress to application master.
HDFS:i/o
The process of job run in YARN
Job submission:Your program triggers the job client and the job client contacts the RM for the new job id. copy the job resource to HDFS with high replica and then submit the job.
Job Initialization: Then RM (the scheduler)picks up the job from the job queue(FIFO,capacity,fair) and contacts NM,sponsor new container (Linux kernel feature, a abstruct of resource like cpu,mem,disk,network bandwidth. doker uses it too) and launches AM for the job.
Job Assigement: AM creates new objects , it retrives the input splits from HDFS and crete one task per input split. AM then decides if the job is samll or not. If it is small job , run its jvm on a single node. If not,contacts RM locate computing resources.
Job Execution:RM considers data locality while assigning the resources(Scheduler at this time knows where the splits are located.It gathers this info from the heartbeats of NM. Based on it, it consider data locality when allocating resources. try as best, then consider the rack local nodes ,if still fails, it will pick random from available noedes).AM then communicates node managers which launches the yarn child(a java program the main class is YarnChild,seperate JVM from long running system demons from the suer code). yarn child retrieves the code and other resource from HDFS and then run the tasks(mr).Yan child sends the progress to AM(every 3 seconds) which aggregrates(each yarn client's information) the report and sends the report to the client.
Job Exmpletion:On job completion , yarn child and AM terminates themseves for the next job.
MR execution in YARN的更多相关文章
- Yarn源码分析之MRAppMaster上MapReduce作业处理总流程(一)
我们知道,如果想要在Yarn上运行MapReduce作业,仅需实现一个ApplicationMaster组件即可,而MRAppMaster正是MapReduce在Yarn上ApplicationMas ...
- hadoop多机安装HA+YARN
HA 相比于Hadoop1.0,Hadoop 2.0中的HDFS增加了两个重大特性,HA(热备)和Federation(联邦).HA即为High Availability,用于解决NameNode单点 ...
- hadoop多机安装YARN
hadoop伪分布安装称为测试环境安装,多机分布称为生成环境安装.以下安装没有进行HA(热备)和Federation(联邦).除非是性能需要,否则没必要安装Federation,HA可以一试,涉及到Z ...
- Hadoop2.4.1 64-Bit QJM HA and YARN HA + Zookeeper-3.4.6 + Hbase-0.98.8-hadoop2-bin HA Install
Hadoop2.4.1 64-Bit QJM HA and YARN HA Install + Zookeeper-3.4.6 + Hbase-0.98.8-hadoop2-bin HA(Hadoop ...
- Hadoop 5、HDFS HA 和 YARN
Hadoop 2.0 产生的背景Hadoop 1.0 中HDFS和MapReduce存在高可用和扩展方面的问题 HDFS存在的问题 NameNode单点故障,难以用于在线场景 NameNode压力过大 ...
- YARN的基础配置
基于HADOOP3.0+Centos7.0的yarn基础配置: 执行步骤:(1)配置集群yarn (2)启动.测试集群(3)在yarn上执行wordcount案例 一.配置yarn集群 1.配置yar ...
- YARN的三种调度器的使用
YRAN提供了三种调度策略 一.FIFO-先进先出调度器 YRAN默认情况下使用的是该调度器,即所有的应用程序都是按照提交的顺序来执行的,这些应用程序都放在一个队列中,只有在前面的一个任务执行完成之后 ...
- Hadoop YARN上运行MapReduce程序
(1)配置集群 (a)配置hadoop-2.7.2/etc/hadoop/yarn-env.sh 配置一下JAVA_HOME export JAVA_HOME=/home/hadoop/bigdata ...
- hadoop3.1集成yarn ha
1.角色分配
随机推荐
- Python 学习笔记(九)Python元组和字典(三)
字典常用方法 copy() 返回一个字典的浅复制 示例:浅拷贝d.copy() 深拷贝引入import copy copy.deepcopy() >>> help(dict.co ...
- Reading Notes : 180212 冯诺依曼计算机
读书<计算机组成原理>,百度百科 现在大部分接触过计算机的人,都会知道冯诺依曼计算机,但是这个概念是怎么来的呢?本节我们就通过聊一下计算机的存储程序控制,来认识”冯诺依曼”. 存储程序控制 ...
- Java之变量
Java变量分为类变量.实例变量.局部变量: 类变量包括静态变量: 局部变量:就是本地变量,使用范围:方法,构造器(构造方法),块:销毁:程序执行完或退出立即销毁:局部变量没有默认值,声明的同时必须赋 ...
- vue的$emit 与$on父子组件与兄弟组件的之间通信
本文主要对vue 用$emit 与 $on 来进行组件之间的数据传输. 主要的传输方式有三种: 1.父组件到子组件通信 2.子组件到父组件的通信 3.兄弟组件之间的通信 一.父组件传值给子组件 父组件 ...
- Yii2中使用Soap WebSerivce
Soap是一种轻量的.简单的.基于XML(标准通用标记语言下的一个子集)的协议 WebService顾名思义就是web服务,web服务主要有两种,一种是基于soap类型的服务,一种是基于rest类型的 ...
- 05.odoo12开源框架学习
博客为日常工作学习积累总结: 1.odoo12学习 参考博客:https://alanhou.org/centos-odoo-12/ CentOS 7快速安装配置 Odoo 12 添加新用户必做,不然 ...
- 搭建Apache服务器并使用自签证书实现https访问
实验环境:两台Centos7.2的虚拟机,一台作CA服务器,一台作Apache服务器,此处安装httpd-2.4.6的版本. 1)CA服务器 # 私钥一般存放位置:/etc/pki/CA/privat ...
- Case Helper
using System; using Microsoft.Xrm.Sdk; using Microsoft.Crm.Sdk.Messages; using Microsoft.Xrm.Sdk.Que ...
- 基于原生JS封装数组原型上的sort方法
基于原生JS封装数组原型上的sort方法 最近学习了数组的原型上内置方法的封装,加强了用原生JS封装方法的能力,也进一步理解数组方法封装的过程,实现的功能.虽然没有深入底层,了解源码.以下解法都是基于 ...
- python 爬虫 5i5j房屋信息 获取并存储到数据库
from lxml import etree from selenium import webdriver import pymysql def Geturl(fullurl):#获取每个招聘网页的链 ...