How to Plan and Configure YARN and MapReduce 2 in HDP 2.0
As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines. This also streamlines MapReduce to do what it does best, process data. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.

In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment. This will cover YARN and MapReduce 2. We’ll use an example physical cluster of slave nodes each with 48 GB ram, 12 disks and 2 hex core CPUs (12 total cores).
YARN takes into account all the available compute resources on each machine in the cluster. Based on the available resources, YARN will negotiate resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.).
Configuring YARN
In a Hadoop cluster, it’s vital to balance the usage of RAM, CPU and disk so that processing is not constrained by any one of these cluster resources. As a general recommendation, we’ve found that allowing for 1-2 Containers per disk and per core gives the best balance for cluster utilization. So with our example cluster node with 12 disks and 12 cores, we will allow for 20 maximum Containers to be allocated to each node.
Each machine in our cluster has 48 GB of RAM. Some of this RAM should be reserved for Operating System usage. On each node, we’ll assign 40 GB RAM for YARN to use and keep 8 GB for the Operating System. The following property sets the maximum memory YARN can utilize on the node:
In yarn-site.xml
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
The next step is to provide YARN guidance on how to break up the total resources available into Containers. You do this by specifying the minimum unit of RAM to allocate for a Container. We want to allow for a maximum of 20 Containers, and thus need (40 GB total RAM) / (20 # of Containers) = 2 GB minimum per container:
In yarn-site.xml
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
YARN will allocate Containers with RAM amounts greater than the yarn.scheduler.minimum-allocation-mb.
Configuring MapReduce 2
MapReduce 2 runs on top of YARN and utilizes YARN Containers to schedule and execute its map and reduce tasks.
When configuring MapReduce 2 resource utilization on YARN, there are three aspects to consider:
- Physical RAM limit for each Map And Reduce task
- The JVM heap size limit for each task
- The amount of virtual memory each task will get
You can define how much maximum memory each Map and Reduce task will take. Since each Map and each Reduce will run in a separate Container, these maximum memory settings should be at least equal to or more than the YARN minimum Container allocation.
For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.
In mapred-site.xml:
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.
In mapred-site.xml:
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use. The virtual memory (physical + paged memory) upper limit for each Map and Reduce task is determined by the virtual memory ratio each YARN Container is allowed. This is set by the following configuration, and the default value is 2.1:
In yarn-site.xml:
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
Thus, with the above settings on our example cluster, each Map task will get the following memory allocations with the following:
- Total physical RAM allocated = 4 GB
- JVM heap space upper limit within the Map task Container = 3 GB
- Virtual memory upper limit = 4*2.1 = 8.2 GB
With YARN and MapReduce 2, there are no longer pre-configured static slots for Map and Reduce tasks. The entire cluster is available for dynamic resource allocation of Maps and Reduces as needed by the job. In our example cluster, with the above configurations, YARN will be able to allocate on each node up to 10 mappers (40/4) or 5 reducers (40/8) or a permutation within that.
How to Plan and Configure YARN and MapReduce 2 in HDP 2.0的更多相关文章
- Wordcount on YARN 一个MapReduce示例
Hadoop YARN版本:2.2.0 关于hadoop yarn的环境搭建可以参考这篇博文:Hadoop 2.0安装以及不停集群加datanode hadoop hdfs yarn伪分布式运行,有如 ...
- 怎样通过Java程序提交yarn的mapreduce计算任务
因为项目需求,须要通过Java程序提交Yarn的MapReduce的计算任务.与一般的通过Jar包提交MapReduce任务不同,通过程序提交MapReduce任务须要有点小变动.详见下面代码. 下面 ...
- Determine YARN and MapReduce Memory Configuration Settings
Determine YARN and MapReduce Memory Configuration Settings https://docs.hortonworks.com/HDPDocuments ...
- 3.Hadoop测试Yarn和MapReduce
Hadoop测试Yarn和MapReduce 1.配置Yarn (1)配置ResourceManager 生产环境中,一般是重开一台机器作为ResourceManager,这里我们以Master机器代 ...
- YARN和MapReduce的内存设置参考
如何确定Yarn中容器Container,Mapreduce相关参数的内存设置,对于初始集群,由于不知道集群的类型(如cpu密集.内存密集)我们需要根据经验提供给我们一个参考配置值,来作为基础的配置. ...
- YARN和MapReduce的内存设置參考
怎样确定Yarn中容器Container,Mapreduce相关參数的内存设置,对于初始集群,由于不知道集群的类型(如cpu密集.内存密集)我们须要依据经验提供给我们一个參考配置值,来作为基础的配置. ...
- Hadoop2 使用 YARN 运行 MapReduce 的过程源码分析
Hadoop 使用 YARN 运行 MapReduce 的过程如下图所示: 总共分为11步. 这里以 WordCount 为例, 我们在客户端终端提交作业: # 把本地的 /home/hadoop/t ...
- 经典MapReduce作业和Yarn上MapReduce作业运行机制
一.经典MapReduce的作业运行机制 如下图是经典MapReduce作业的工作原理: 1.1 经典MapReduce作业的实体 经典MapReduce作业运行过程包含的实体: 客户端,提交MapR ...
- 大数据系列4:Yarn以及MapReduce 2
系列文章: 大数据系列:一文初识Hdfs 大数据系列2:Hdfs的读写操作 大数据谢列3:Hdfs的HA实现 通过前文,我们对Hdfs的已经有了一定的了解,本文将继续之前的内容,介绍Yarn与Yarn ...
随机推荐
- java相关资料连接
1.tomcat原理https://www.ibm.com/developerworks/cn/java/j-lo-tomcat1/index.html ....
- hibernate之关联关系一对多
什么是关联(association) 关联指的是类之间的引用关系.如果类A与类B关联,那么被引用的类B将被定义为类A的属性.例如: public class B{ private St ...
- Java基础学习【字符串倒序输出+排序】
字符串逆序输出 import java.util.*; public class Main{ public static void main(String [] args) { //字符串逆序输出 S ...
- Thinkphp远程代码执行 payload汇总
Thinkphp 5.0.22http://192.168.1.1/thinkphp/public/?s=.|think\config/get&name=database.usernameht ...
- RookeyFrame 还原 软删除的数据 怎么硬删除 或者 怎么还原
列表搜索栏上有个删除图标,可以进入回收站 如图:
- 猴猴吃香蕉 背包DP
猴猴吃香蕉 背包DP \(D\)次询问,第\(i\)次询问,每次有\(n_i\)个带权香蕉,问有多少方案使香蕉之积为\(k_i\),对结果取模\(1000000007\) \(n\le 10^3,k\ ...
- YII框架的依赖注入容器
依赖注入(Dependency Injection,DI)容器就是一个对象,它知道怎样初始化并配置对象及其依赖的所有对象. 所谓的依赖就是,一个对象,要使用另外一个对象才能完成某些功能.那么这个对象就 ...
- UOJ#397. 【NOI2018】情报中心 线段树合并 虚树
原文链接www.cnblogs.com/zhouzhendong/p/UOJ397.com 前言 这真可做吗?只能贺题解啊-- 题解 我们称一条路径的 LCA 为这条路径两端点的 LCA. 我们将相交 ...
- svn乌龟怎么用
0601 首先右键SVN-checkout 0602 其他地方可以不用修改,Version处可以修改,表示从指定版本号开始,点击OK. 0603 就会直接下载,如果改变的话,就会由绿色变成红色. 06 ...
- Liskov替换原则(LSP)
OCP背后的主要机制是抽象和多态.在静态类型语言中,比如C++和Java,支持抽象和多态的关键机制之一是继承.正是使用了继承,才可以创建实现其基类中抽象方法的派生类.是什么设计规则在支配着这种特殊的继 ...