How to Plan and Configure YARN and MapReduce 2 in HDP 2.0
As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines. This also streamlines MapReduce to do what it does best, process data. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.

In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment. This will cover YARN and MapReduce 2. We’ll use an example physical cluster of slave nodes each with 48 GB ram, 12 disks and 2 hex core CPUs (12 total cores).
YARN takes into account all the available compute resources on each machine in the cluster. Based on the available resources, YARN will negotiate resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.).
Configuring YARN
In a Hadoop cluster, it’s vital to balance the usage of RAM, CPU and disk so that processing is not constrained by any one of these cluster resources. As a general recommendation, we’ve found that allowing for 1-2 Containers per disk and per core gives the best balance for cluster utilization. So with our example cluster node with 12 disks and 12 cores, we will allow for 20 maximum Containers to be allocated to each node.
Each machine in our cluster has 48 GB of RAM. Some of this RAM should be reserved for Operating System usage. On each node, we’ll assign 40 GB RAM for YARN to use and keep 8 GB for the Operating System. The following property sets the maximum memory YARN can utilize on the node:
In yarn-site.xml
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
The next step is to provide YARN guidance on how to break up the total resources available into Containers. You do this by specifying the minimum unit of RAM to allocate for a Container. We want to allow for a maximum of 20 Containers, and thus need (40 GB total RAM) / (20 # of Containers) = 2 GB minimum per container:
In yarn-site.xml
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
YARN will allocate Containers with RAM amounts greater than the yarn.scheduler.minimum-allocation-mb.
Configuring MapReduce 2
MapReduce 2 runs on top of YARN and utilizes YARN Containers to schedule and execute its map and reduce tasks.
When configuring MapReduce 2 resource utilization on YARN, there are three aspects to consider:
- Physical RAM limit for each Map And Reduce task
- The JVM heap size limit for each task
- The amount of virtual memory each task will get
You can define how much maximum memory each Map and Reduce task will take. Since each Map and each Reduce will run in a separate Container, these maximum memory settings should be at least equal to or more than the YARN minimum Container allocation.
For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.
In mapred-site.xml:
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.
In mapred-site.xml:
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use. The virtual memory (physical + paged memory) upper limit for each Map and Reduce task is determined by the virtual memory ratio each YARN Container is allowed. This is set by the following configuration, and the default value is 2.1:
In yarn-site.xml:
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
Thus, with the above settings on our example cluster, each Map task will get the following memory allocations with the following:
- Total physical RAM allocated = 4 GB
- JVM heap space upper limit within the Map task Container = 3 GB
- Virtual memory upper limit = 4*2.1 = 8.2 GB
With YARN and MapReduce 2, there are no longer pre-configured static slots for Map and Reduce tasks. The entire cluster is available for dynamic resource allocation of Maps and Reduces as needed by the job. In our example cluster, with the above configurations, YARN will be able to allocate on each node up to 10 mappers (40/4) or 5 reducers (40/8) or a permutation within that.
How to Plan and Configure YARN and MapReduce 2 in HDP 2.0的更多相关文章
- Wordcount on YARN 一个MapReduce示例
Hadoop YARN版本:2.2.0 关于hadoop yarn的环境搭建可以参考这篇博文:Hadoop 2.0安装以及不停集群加datanode hadoop hdfs yarn伪分布式运行,有如 ...
- 怎样通过Java程序提交yarn的mapreduce计算任务
因为项目需求,须要通过Java程序提交Yarn的MapReduce的计算任务.与一般的通过Jar包提交MapReduce任务不同,通过程序提交MapReduce任务须要有点小变动.详见下面代码. 下面 ...
- Determine YARN and MapReduce Memory Configuration Settings
Determine YARN and MapReduce Memory Configuration Settings https://docs.hortonworks.com/HDPDocuments ...
- 3.Hadoop测试Yarn和MapReduce
Hadoop测试Yarn和MapReduce 1.配置Yarn (1)配置ResourceManager 生产环境中,一般是重开一台机器作为ResourceManager,这里我们以Master机器代 ...
- YARN和MapReduce的内存设置参考
如何确定Yarn中容器Container,Mapreduce相关参数的内存设置,对于初始集群,由于不知道集群的类型(如cpu密集.内存密集)我们需要根据经验提供给我们一个参考配置值,来作为基础的配置. ...
- YARN和MapReduce的内存设置參考
怎样确定Yarn中容器Container,Mapreduce相关參数的内存设置,对于初始集群,由于不知道集群的类型(如cpu密集.内存密集)我们须要依据经验提供给我们一个參考配置值,来作为基础的配置. ...
- Hadoop2 使用 YARN 运行 MapReduce 的过程源码分析
Hadoop 使用 YARN 运行 MapReduce 的过程如下图所示: 总共分为11步. 这里以 WordCount 为例, 我们在客户端终端提交作业: # 把本地的 /home/hadoop/t ...
- 经典MapReduce作业和Yarn上MapReduce作业运行机制
一.经典MapReduce的作业运行机制 如下图是经典MapReduce作业的工作原理: 1.1 经典MapReduce作业的实体 经典MapReduce作业运行过程包含的实体: 客户端,提交MapR ...
- 大数据系列4:Yarn以及MapReduce 2
系列文章: 大数据系列:一文初识Hdfs 大数据系列2:Hdfs的读写操作 大数据谢列3:Hdfs的HA实现 通过前文,我们对Hdfs的已经有了一定的了解,本文将继续之前的内容,介绍Yarn与Yarn ...
随机推荐
- tcp的三次握手和四次挥手转自https://www.jianshu.com/p/d3725391af59
三次握手(three-way handshaking) 1.背景:TCP位于传输层,作用是提供可靠的字节流服务,为了准确无误地将数据送达目的地,TCP协议采纳三次握手策略. 2.原理: 1)发送端首先 ...
- ES中的分析和分析器
在ES存储的文档,进行存储时,会对文档的内容进行分析和分词 分析的过程: 首先,将一块文本分成适合于倒排索引的独立的 词条 , 之后,将这些词条统一化为标准格式以提高它们的“可搜索性”,或者 reca ...
- zip unzip tar 压缩相关
unzip 解压时,需要直接覆盖以解压的文件 -o 则不再进行询问,直接覆盖原文件解压缩 示例 unzip -o file_name.zip
- Java 通过HttpClient Post方式提交json请求
package com.sinosoft.ap.harmfullibrary.util; /** * 发送post请求 */import net.sf.json.JSONObject; import ...
- java 利用反射调用静态方法的示例
内容简介 主要介绍使用反射的机制来调用执行类中的静态方法. 静态方法 public class GisUtil { private final static Logger logger = Logge ...
- 项链与手镯Uva 10294——Polya定理
题意 项链和手镯都是由若干珠子串成的环形首饰,区别在于手环可以翻转,但项链不可以. 输入整数 $n$ 和 $t$,输出用 $t$ 中颜色 $n$ 颗珠子能制作成的项链和手镯的个数.($1\leq n ...
- python - 栈与队列(只有代码)
1. 栈: - 后进先出 class Stack(object): def __init__(self): self.stack = [] def peek(self): return self.st ...
- Linux 常用命令1 pwd、ls、cd、tab、清屏、重定向、转义、管道、touch、mkdir、tree、cat、more、rmdir、rm、grep、help、man、history、find、cp、mv、tar、gz
版权声明:本文为博主引用文章,未经博主及作者允许不得转载. 声明: 涉及的命令:pwd.ls.cd.tab.清屏.重定向.转义.管道.touch.mkdir.tree.cat.more.rmdir. ...
- ES 大批量写入提高性能的策略
1.用bulk批量写入 你如果要往es里面灌入数据的话,那么根据你的业务场景来,如果你的业务场景可以支持让你将一批数据聚合起来,一次性写入es,那么就尽量采用bulk的方式,每次批量写个几百条这样子. ...
- leetcode 712
这道题的思路:我是根据最长公共子序列的思路得来的. 最长公共子序列是: d[i][j]表示字符串s1前i个(0-i-1)字符,和字符串s2前j个(0-j-1)字符的最长公共子序列. 分情况讨论: 当s ...