As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines.  This also streamlines MapReduce to do what it does best, process data.  With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.

In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment. This will cover YARN and MapReduce 2. We’ll use an example physical cluster of slave nodes each with 48 GB ram, 12 disks and 2 hex core CPUs (12 total cores).

YARN takes into account all the available compute resources on each machine in the cluster. Based on the available resources, YARN will negotiate resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.).

Configuring YARN

In a Hadoop cluster, it’s vital to balance the usage of RAM, CPU and disk so that processing is not constrained by any one of these cluster resources. As a general recommendation, we’ve found that allowing for 1-2 Containers per disk and per core gives the best balance for cluster utilization. So with our example cluster node with 12 disks and 12 cores, we will allow for 20 maximum Containers to be allocated to each node.

Each machine in our cluster has 48 GB of RAM. Some of this RAM should be reserved for Operating System usage. On each node, we’ll assign 40 GB RAM for YARN to use and keep 8 GB for the Operating System. The following property sets the maximum memory YARN can utilize on the node:

In yarn-site.xml

<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>

The next step is to provide YARN guidance on how to break up the total resources available into Containers. You do this by specifying the minimum unit of RAM to allocate for a Container. We want to allow for a maximum of 20 Containers, and thus need (40 GB total RAM) / (20 # of Containers) = 2 GB minimum per container:

In yarn-site.xml

 <name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>

YARN will allocate Containers with RAM amounts greater than the yarn.scheduler.minimum-allocation-mb.

Configuring MapReduce 2

MapReduce 2 runs on top of YARN and utilizes YARN Containers to schedule and execute its map and reduce tasks.

When configuring MapReduce 2 resource utilization on YARN, there are three aspects to consider:

  1. Physical RAM limit for each Map And Reduce task
  2. The JVM heap size limit for each task
  3. The amount of virtual memory each task will get

You can define how much maximum memory each Map and Reduce task will take. Since each Map and each Reduce will run in a separate Container, these maximum memory settings should be at least equal to or more than the YARN minimum Container allocation.

For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.

In mapred-site.xml:

 <name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>

Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.

In mapred-site.xml:

 <name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>

The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use. The virtual memory (physical + paged memory) upper limit for each Map and Reduce task is determined by the virtual memory ratio each YARN Container is allowed. This is set by the following configuration, and the default value is 2.1:

In yarn-site.xml:

 <name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>

Thus, with the above settings on our example cluster, each Map task will get the following memory allocations with the following:

  • Total physical RAM allocated = 4 GB
  • JVM heap space upper limit within the Map task Container = 3 GB
  • Virtual memory upper limit = 4*2.1 = 8.2 GB

With YARN and MapReduce 2, there are no longer pre-configured static slots for Map and Reduce tasks. The entire cluster is available for dynamic resource allocation of Maps and Reduces as needed by the job. In our example cluster, with the above configurations, YARN will be able to allocate on each node up to 10 mappers (40/4) or 5 reducers (40/8) or a permutation within that.

How to Plan and Configure YARN and MapReduce 2 in HDP 2.0的更多相关文章

  1. Wordcount on YARN 一个MapReduce示例

    Hadoop YARN版本:2.2.0 关于hadoop yarn的环境搭建可以参考这篇博文:Hadoop 2.0安装以及不停集群加datanode hadoop hdfs yarn伪分布式运行,有如 ...

  2. 怎样通过Java程序提交yarn的mapreduce计算任务

    因为项目需求,须要通过Java程序提交Yarn的MapReduce的计算任务.与一般的通过Jar包提交MapReduce任务不同,通过程序提交MapReduce任务须要有点小变动.详见下面代码. 下面 ...

  3. Determine YARN and MapReduce Memory Configuration Settings

    Determine YARN and MapReduce Memory Configuration Settings https://docs.hortonworks.com/HDPDocuments ...

  4. 3.Hadoop测试Yarn和MapReduce

    Hadoop测试Yarn和MapReduce 1.配置Yarn (1)配置ResourceManager 生产环境中,一般是重开一台机器作为ResourceManager,这里我们以Master机器代 ...

  5. YARN和MapReduce的内存设置参考

    如何确定Yarn中容器Container,Mapreduce相关参数的内存设置,对于初始集群,由于不知道集群的类型(如cpu密集.内存密集)我们需要根据经验提供给我们一个参考配置值,来作为基础的配置. ...

  6. YARN和MapReduce的内存设置參考

    怎样确定Yarn中容器Container,Mapreduce相关參数的内存设置,对于初始集群,由于不知道集群的类型(如cpu密集.内存密集)我们须要依据经验提供给我们一个參考配置值,来作为基础的配置. ...

  7. Hadoop2 使用 YARN 运行 MapReduce 的过程源码分析

    Hadoop 使用 YARN 运行 MapReduce 的过程如下图所示: 总共分为11步. 这里以 WordCount 为例, 我们在客户端终端提交作业: # 把本地的 /home/hadoop/t ...

  8. 经典MapReduce作业和Yarn上MapReduce作业运行机制

    一.经典MapReduce的作业运行机制 如下图是经典MapReduce作业的工作原理: 1.1 经典MapReduce作业的实体 经典MapReduce作业运行过程包含的实体: 客户端,提交MapR ...

  9. 大数据系列4:Yarn以及MapReduce 2

    系列文章: 大数据系列:一文初识Hdfs 大数据系列2:Hdfs的读写操作 大数据谢列3:Hdfs的HA实现 通过前文,我们对Hdfs的已经有了一定的了解,本文将继续之前的内容,介绍Yarn与Yarn ...

随机推荐

  1. Oralce if ..elsif结构

    create or replace procedure sp_pro6(spNo number) is v_job emp.job%type; begin select e.job into v_jo ...

  2. 大数相加和大数相乘以及打印从1到最大的n位数

    string add(string a, string b){ int nlength; int diff; if (a.size() > b.size()){ nlength = a.size ...

  3. 初【001】——Python 基础知识

    1.python基础入门 提示: 语法基于python3.x版本(会提示2.x版本和3.x版本的区别) Python命令行将以>>>开始,比如 >>>print ( ...

  4. Tensorflow细节-P174-真正的图像预处理

    注意这里的读取image_raw_data = tf.gfile.FastGFile("./datasets/cat.jpg", "rb").read(),写入 ...

  5. Linux LVM--三种Logic Volume

    本文链接:https://blog.csdn.net/u012299594/article/details/84551722 概述 为了满足在性能和冗余等方面的需求,LVM支持了下面三种Logic V ...

  6. (尚012)Vue表单数据的自动手集(表单数据提交,需要收集表单数据)

    自动收集,就是我一输入数据,就自动收集,等我点击提交按钮的时候,数据就收集好了 1.使用v-model对表单数据自动收集 1)text/textare----单行/多行输入框 2)checkbox-- ...

  7. asp.net+ueditor word粘贴上传

    最近公司做项目需要实现一个功能,在网页富文本编辑器中实现粘贴Word图文的功能. 我们在网站中使用的Web编辑器比较多,都是根据用户需求来选择的.目前还没有固定哪一个编辑器 有时候用的是UEditor ...

  8. Ubuntu下彻底卸载默认安装的mysql,自己手动下载安装MYSQL

    彻底卸载: sudo apt-get autoremove --purge mysql-server-5.7 sudo apt-get remove mysql-common sudo rm -rf ...

  9. 洛谷 P2010 回文日期 题解

    P2010 回文日期 题目描述 在日常生活中,通过年.月.日这三个要素可以表示出一个唯一确定的日期. 牛牛习惯用88位数字表示一个日期,其中,前44位代表年份,接下来22位代表月 份,最后22位代表日 ...

  10. [ThinkPHP6.*安装 (草稿先发布,再维护)

    ThinkPHP6.0的安装,官方文档中有详细的说明,不过在安装之前,大家还是要做一些准备的,就是PHP本地开发环境 的搭建. 官方手册地址:https://www.kancloud.cn/manua ...