原文地址:http://zh.hortonworks.com/blog/apache-hadoop-yarn-resourcemanager/

ResourceManager (RM) is the master that arbitrates all the available cluster resources and thus helps manage the distributed applications running on the YARN system. It works together with the per-node NodeManagers (NMs) and the per-application ApplicationMasters (AMs).

  1. NodeManagers take instructions from the ResourceManager and manage resources available on a single node.
  2. ApplicationMasters are responsible for negotiating resources with the ResourceManager and for working with the NodeManagers to start the containers.

ResourceManager Components

The ResourceManager has the following components (see the figure above):

  1. Components interfacing RM to the clients:

    • ClientService: The client interface to the Resource Manager. This component handles all the RPC interfaces to the RM from the clients including operations like application submission, application termination, obtaining queue information, cluster statistics etc.
    • AdminService: To make sure that admin requests don’t get starved due to the normal users’ requests and to give the operators’ commands the higher priority, all the admin operations like refreshing node-list, the queues’ configuration etc. are served via this separate interface.
  2. Components connecting RM to the nodes:
    • ResourceTrackerService: This is the component that responds to RPCs from all the nodes. It is responsible for registration of new nodes, rejecting requests from any invalid/decommissioned nodes, obtain node-heartbeats and forward them over to the YarnScheduler. It works closely with NMLivelinessMonitor and NodesListManager described below.
    • NMLivelinessMonitor: To keep track of live nodes and specifically note down the dead nodes, this component keeps track of each node’s its last heartbeat time. Any node that doesn’t heartbeat within a configured interval of time, by default 10 minutes, is deemed dead and is expired by the RM. All the containers currently running on an expired node are marked as dead and no new containers are scheduling on such node.
    • NodesListManager: A collection of valid and excluded nodes. Responsible for reading the host configuration files specified via yarn.resourcemanager.nodes.include-path andyarn.resourcemanager.nodes.exclude-path and seeding the initial list of nodes based on those files. Also keeps track of nodes that are decommissioned as time progresses.
  3. Components interacting with the per-application AMs:
    • ApplicationMasterService: This is the component that responds to RPCs from all the AMs. It is responsible for registration of new AMs, termination/unregister-requests from any finishing AMs, obtaining container-allocation & deallocation requests from all running AMs and forward them over to the YarnScheduler. This works closely with AMLivelinessMonitor described below.
    • AMLivelinessMonitor: To help manage the list of live AMs and dead/non-responding AMs, this component keeps track of each AM and its last heartbeat time. Any AM that doesn’t heartbeat within a configured interval of time, by default 10 minutes, is deemed dead and is expired by the RM. All the containers currently running/allocated to an AM that gets expired are marked as dead. RM schedules the same AM to run on a new container, allowing up to a maximum of 4 such attempts by default.
  4. The core of the ResourceManager – the scheduler and related components:
    • ApplicationsManager: Responsible for maintaining a collection of submitted applications. Also keeps a cache of completed applications so as to serve users’ requests via web UI or command line long after the applications in question finished.
    • ApplicationACLsManager: RM needs to gate the user facing APIs like the client and admin requests to be accessible only to authorized users. This component maintains the ACLs lists per application and enforces them whenever an request like killing an application, viewing an application status is received.
    • ApplicationMasterLauncher: Maintains a thread-pool to launch AMs of newly submitted applications as well as applications whose previous AM attempts exited due to some reason. Also responsible for cleaning up the AM when an application has finished normally or forcefully terminated.
    • YarnScheduler: The Scheduler is responsible for allocating resources to the various running applications subject to constraints of capacities, queues etc. It performs its scheduling function based on the resource requirements of the applications such as memory, CPU, disk, network etc. Currently, only memory is supported and support for CPU is close to completion.
    • ContainerAllocationExpirer: This component is in charge of ensuring that all allocated containers are used by AMs and subsequently launched on the correspond NMs. AMs run as untrusted user code and can potentially hold on to allocations without using them, and as such can cause cluster under-utilization. To address this, ContainerAllocationExpirer maintains the list of allocated containers that are still not used on the corresponding NMs. For any container, if the corresponding NM doesn’t report to the RM that the container has started running within a configured interval of time, by default 10 minutes, the container is deemed as dead and is expired by the RM.
  5. TokenSecretManagers (for security):ResourceManager has a collection of SecretManagers which are charged with managing tokens, secret-keys that are used to authenticate/authorize requests on various RPC interfaces. A future post on YARN security will cover a more detailed descriptions of the tokens, secret-keys and the secret-managers but a brief summary follows:
    • ApplicationTokenSecretManager: To avoid arbitrary processes from sending RM scheduling requests, RM uses the per-application tokens called ApplicationTokens. This component saves each token locally in memory till application finishes and uses it to authenticate any request coming from a valid AM process.
    • ContainerTokenSecretManager: SecretManager for ContainerTokens that are special tokens issued by RM to an AM for a container on a specific node. ContainerTokens are used by AMs to create a connection to the corresponding NM where the container is allocated. This component is RM-specific, keeps track of the underlying master and secret-keys and rolls the keys every so often.
    • RMDelegationTokenSecretManager: A ResourceManager specific delegation-token secret-manager. It is responsible for generating delegation tokens to clients which can be passed on to unauthenticated processes that wish to be able to talk to RM.
  6. DelegationTokenRenewer: In secure mode, RM is Kerberos authenticated and so provides the service of renewing file-system tokens on behalf of the applications. This component renews tokens of submitted applications as long as the application runs and till the tokens can no longer be renewed.

Conclusion

In YARN, the ResourceManager is primarily limited to scheduling i.e. only arbitrating available resources in the system among the competing applications and not concerning itself with per-application state management. Because of this clear separation of responsibilities coupled with the modularity described above, and with the powerful scheduler API discussed in the previous post, RM is able to address the most important design requirements – scalability, support for alternate programming paradigms.

To allow for different policy constraints, the scheduler described above in the RM is pluggable and allows for different algorithms. In a future post of this series, we will dig deeper into various features of CapacityScheduler that schedules containers based on capacity guarantees and queues.

The next post will dive into details of the NodeManager, the component responsible for managing the containers’ life cycle and much more.

Apache Hadoop YARN – ResourceManager--转载的更多相关文章

  1. Apache Hadoop YARN: 背景及概述

    从2012年8月开始Apache Hadoop YARN(YARN = Yet Another Resource Negotiator)成了Apache Hadoop的一项子工程.自此Apache H ...

  2. hadoop错误org.apache.hadoop.yarn.exceptions.YarnException Unauthorized request to start container

    错误: 14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to ...

  3. Apache Hadoop YARN – NodeManager--转载

    原文地址:http://zh.hortonworks.com/blog/apache-hadoop-yarn-nodemanager/ The NodeManager (NM) is YARN’s p ...

  4. spark 笔记 4:Apache Hadoop YARN: Yet Another Resource Negotiator

    spark支持YARN做资源调度器,所以YARN的原理还是应该知道的:http://www.socc2013.org/home/program/a5-vavilapalli.pdf    但总体来说, ...

  5. 记录一次 hadoop yarn resourceManager无故切换的故障

    某日 收到告警 线上集群rm切换 观察resourcemanager 日志报错如下 这行不明显 再看看其他日志报错 在 app attempt_removed 时候发生了空指针错误 break; ca ...

  6. spark on yarn 动态资源分配报错的解决:org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

    组件:cdh5.14.0 spark是自己编译的spark2.1.0-cdh5.14.0 第一步:确认spark-defaults.conf中添加了如下配置: spark.shuffle.servic ...

  7. org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService: mapreduce_shuffle do

    在yarn-site.xml 配置文件中增加: <property> <name>yarn.nodemanager.aux-services</name> < ...

  8. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException

    这个是Flink 1.11.1  使用yarn-session 出现的错误:原因是在Flink1.11 之后不再提供flink-shaded-hadoop-*” jars 需要在yarn-sessio ...

  9. Caused by:java.lang.ClassNotFoundException:org.apache.hadoop.yarn.util.Apps

    错误原因 缺少hadoop-yarn.jar包. 导入jar包就好了~-~

随机推荐

  1. 20155316 2016-2017-2 《Java程序设计》第1周学习总结

    学习内容总结 在linux的环境下学习怎么安git.jdk.Intellj IDEA 精读教材1.2章 课前准备部分 git的学习 学习进度 系统学习学到创建版本库-廖雪峰的官方网站 git推送学到最 ...

  2. 20155327 信息安全技术 实验二 Windows口令破解

    课程:信息安全概论 班级:1553 姓名:了李百乾 学号:20155327 成绩: 指导教师: 李冬冬 实验日期及时间: 2017年10月11日 15:30-18:00 必修/选修:必修 实验序号:0 ...

  3. css图片文字一排

    <div class="footer1"> <div class="vercital-head"></div><!-- ...

  4. [Vani有约会]雨天的尾巴 线段树合并

    [Vani有约会]雨天的尾巴 LG传送门 线段树合并入门好题. 先别急着上线段树合并,考虑一下这题的暴力.一看就是树上差分,对于每一个节点统计每种救济粮的数量,再一遍dfs把差分的结果统计成答案.如果 ...

  5. Python爬虫初探 - selenium+beautifulsoup4+chromedriver爬取需要登录的网页信息

    目标 之前的自动答复机器人需要从一个内部网页上获取的消息用于回复一些问题,但是没有对应的查询api,于是想到了用脚本模拟浏览器访问网站爬取内容返回给用户.详细介绍了第一次探索python爬虫的坑. 准 ...

  6. 解决webstorm中测试dva项目run start命令需要不断重启问题

    用dva-cli构建了项目之后在webstorm开发,用npm start跑本地服务,经常修改之后在浏览器刷新没反应,偶尔才会有刷新,需要重新跑一遍npm start才会更新,这是怎么回事呢? web ...

  7. hbase和ZooKeeper集群安装配置

    一:ZooKeeper集群安装配置 1:解压zookeeper-3.3.2.tar.gz并重命名为zookeeper. 2:进入~/zookeeper/conf目录: 拷贝zoo_sample.cfg ...

  8. Hibernate查询的六种方式

        Hibernate查询的六种方式 分别是HQL查询,对象化查询Criteria方法,动态查询DetachedCriteria,例子查询,sql查询,命名查询. 如果单纯的使用hibernate ...

  9. 20170928-3 四则运算psp

    1.本周psp: 2.本周进度条: 3.累计进度图(折线图): 4.psp饼状图:

  10. 使用C和C++实现“电梯”的区别

    C 面向过程:       该电梯不允许未卜先知,故程序需逐条处理乘客请求并更新当前各变量状态.       如何获得最短时间:是否立即响应请求,计算出不同决策下的总时间,并进行比较,然后选择最短时间 ...