Overview

  • ...

YARN Architecture

  • The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. [基本思想是将资源管理和任务调度/监控分开。]
  • The idea is to have a global ResourceManager(RM) and per-application ApplicationMaster(AM). The application is either a single job or a DAG of jobs. [思路是通过一个全局的资源管理器和一个per-app的应用管理器。]
  • AM: AM is a framework specific library and is tasked with negotiating resources from the RM and working with the NM(s) to execute and monitor the tasks. [AM是由框架指定的,任务是与RM协调资源,并监控NM(s)执行tasks。]
  • RM: RM has two main components: Scheduler and ApplicationsManager.  [Attention: RM有两个组件,其中Scheduler完全就只是负责资源的分配;ApplicationsManager则负责接受application,选取ApplicationMaster,监控重启AM。]
    • The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks. The Shceduler performs its scheduling function based on the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc. The scheduler has a pluggable policy which is responsible for partitioning the cluster resources among the various queues, applications etc. The current schedulers such as the  CapacityScheduler and the FairScheduler would be some examples of plug-ins.[Scheduler负责给运行的applications分配资源。Scheduler不会执行任何的监控或状态追踪。Scheduler基于application的资源需求进行调度,具体包括内存、cpu、硬盘、网络等。]
  • The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container of executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. [RM中的ApplicationsMaster负责接受任务提交,协调第一个应用执行容器作为ApplicationMaster,并且负责ApplicationMaster失败时的重启。]

Capacity Scheculer

  • Capacity Scheduler: a pluggable scheduler for Hadoop which allows for multuple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities.
  • TBD...

YARN资源调度策略

什么资源?

  • 一般来说,在一个分布式、多用户的系统中,我们所指的资源通常是硬件资源,包括CPU使用、内存使用、磁盘用量、IO、网络流量等。这些是比较粗粒度的,也可以考虑更高抽象层次的TPS/请求数等等。
  • YARN的资源抽象比较简单,只有两种:内存和CPU。

基本概念

Container

  • Container是RM分配资源的基本单位。RM负责接收用户的资源请求,并为其分配Container,而NM负责启动Container并监控资源使用。
  • Container的作用不只于资源分配,还用于资源隔离。更进一步,client可以通过Container来要求只在特定节点上分配,从而保证了计算本地性。

调度器与队列

  • 在YARN中,调度器是一个可插拔的组件,常见的有FIFO,CapacityScheduler,FairScheduler,可通过配置文件选择不同的调度器。
  • 在RM端,根据不同的调度器,所有资源被分为一个或多个队列,每个队列包含一定量的资源。
  • 因而,调度器有两个主要功能:
    • 决定如何划分队列;
    • 决定如何分配资源。
  • 此外,调度器还有一些其他的,诸如ACL、抢占、延迟调度等特性。

事件驱动

  • YARN实现了一套基于状态机的事件驱动机制:很多对象内部都有一个预先定义好的有限状态机,相应的事件会触发状态转换,状态转换的过程中会触发预先定义的钩子,钩子执行的过程中又生成新的事件,继续状态转换。
  • 几个角色:
    • Dispatcher —— 用于分发事件,一般是异步的。内部用一个BlockingQueue暂存所有事件。
    • Event —— 事件类型。
    • Handler —— 事件的消费者。每个消费者只handle特定的事件,所有Handler要在Dispatcher上注册。

pull-based

  • AM通过心跳向RM申请资源,但申请的资源不能马上拿到,而是要再经过若干次心跳才能拿到。这是一种pull-based模型。
  • AM通过RPC协议ApplicationMasterProtocol与RM通信。这个协议在服务器端会调用YrnScheduler的allocate方法(所有调度器都必须实现YarnScheduler接口)。allocate方法有两个作用:1. 申请、释放资源;2. 表示AM是否存活(心跳)。

常见调度器

FIFO

  • 最简单的默认调度器,只有一个队列,所有用户共享。
  • 先到先得,因此很容易出现一个用户占满集群所有资源的情况。
  • 可以设置ACL(访问控制列表),但不能设置各个用户的优先级。

CapacityScheduler

  • 在FIFO的支持上,增加多用户支持。
  • 最大化集群吞吐量和利用率。
  • 基本思想:每个用户可以使用特定量的资源,但集群空闲时也可以使用整个集群(因而在单用户时和FIFO差不多。这种设计主要是为了提高集群利用率。)
  • 划分队列:
    • 划分队列使用xml文件配置,每个队列可以使用特定百分比的资源。
    • 队列可以是树状结构,子队列之和不能超过父队列。
  • CapacityScheduler一个比较重要的问题就是百分比如何计算。默认的算法是DefaultResourceCalculator类的ratio方法,只考虑了内存。
  • 以上,CapacityScheduler的优点就是灵活,集群的利用率高;缺点也是由其灵活性造成的,因为CapacityScheduler不支持抢占式调度,必须等上一个任务主动释放资源。

FairScheduler

  • 优先保证“公平”的调度器,每个用户只有特定数量的资源可以用,即使集群很空闲。
  • 使用xml文件配置,每个队列可以使用特定数量的内存和CPU。
  • 队列是树状结构,只有叶子结点能提交任务。
  • 优缺点:稳定,管理方便,运维成本低,相对CapacityScheduler牺牲了灵活性,整体资源利用率不高。

RM REST API's

Overview

  • The RM REST API's allow the user to get information about the cluster - status on the cluster, metrics on the cluster, scheduler informtion, information about nodes in the cluster, and information about applications on the cluster.

Cluster Information API

  • The cluster information resource provides overall information about the cluster.
  • demo usage:
  • public static void getClusterInfo() throws Exception{
    String url = "http://10.3.242.99:8088/ws/v1/cluster";
    InputStream response = new URL(url).openStream(); Scanner scanner = new Scanner(response);
    String responseBody = scanner.useDelimiter("\\A").next();
    System.out.println(responseBody); }

    返回Json:

    {
    "clusterInfo": {
    "id": 1497756603270,
    "startedOn": 1497756603270,
    "state": "STARTED",
    "haState": "ACTIVE",
    "rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore",
    "resourceManagerVersion": "2.6.2",
    "resourceManagerBuildVersion": "2.6.2 from 0cfd050febe4a30b1ee1551dcc527589509fb681 by jenkins source checksum d07deb9ef36deb791d0e2451db849d",
    "resourceManagerVersionBuiltOn": "2015-10-22T00:49Z",
    "hadoopVersion": "2.6.2",
    "hadoopBuildVersion": "2.6.2 from 0cfd050febe4a30b1ee1551dcc527589509fb681 by jenkins source checksum f9ebb94bf5bf9bec892825ede28baca",
    "hadoopVersionBuiltOn": "2015-10-22T00:42Z",
    "haZooKeeperConnectionState": "ResourceManager HA is not enabled."
    }
    }

    【TBD:haStatus,关于RM的HA。】

Cluster Metrics API

  • The cluster metrics resource provides some overall metrics about the cluster.
  • demo usage:
    public static void getClusterMetrics() throws Exception{
    String url = "http://10.3.242.99:8088/ws/v1/cluster/metrics";
    InputStream response = new URL(url).openStream(); Scanner scanner = new Scanner(response);
    String responseBody = scanner.useDelimiter("\\A").next();
    System.out.println(responseBody);
    }

    return json:

    {
    "clusterMetrics": {
    "appsSubmitted": 0,
    "appsCompleted": 0,
    "appsPending": 0,
    "appsRunning": 0,
    "appsFailed": 0,
    "appsKilled": 0,
    "reservedMB": 0,
    "availableMB": 32768,
    "allocatedMB": 0,
    "reservedVirtualCores": 0,
    "availableVirtualCores": 8,
    "allocatedVirtualCores": 0,
    "containersAllocated": 0,
    "containersReserved": 0,
    "containersPending": 0,
    "totalMB": 32768,
    "totalVirtualCores": 8,
    "totalNodes": 1,
    "lostNodes": 0,
    "unhealthyNodes": 0,
    "decommissionedNodes": 0,
    "rebootedNodes": 0,
    "activeNodes": 1
    }

Cluster Scheduler API

  • A scheduler resource contains information about the current scheduler configured in a cluster. It currently supports both the Fifo and Capacity Scheduler.
  • You will get different information depending on which scheduler is configured so be sure to look at the type information.
  • demo usage:
    String clusterSchedulerUrl = "http://10.3.242.99:8088/ws/v1/cluster/scheduler";
    YarnRestAPI.httpGet(clusterSchedulerUrl);

    return json:

    {
    "scheduler": {
    "schedulerInfo": {
    "type": "capacityScheduler",
    "capacity": 100,
    "usedCapacity": 0,
    "maxCapacity": 100,
    "queueName": "root",
    "queues": {
    "queue": [
    {
    "type": "capacitySchedulerLeafQueueInfo",
    "capacity": 100,
    "usedCapacity": 0,
    "maxCapacity": 100,
    "absoluteCapacity": 100,
    "absoluteMaxCapacity": 100,
    "absoluteUsedCapacity": 0,
    "numApplications": 0,
    "queueName": "default",
    "state": "RUNNING",
    "resourcesUsed": {
    "memory": 0,
    "vCores": 0
    },
    "hideReservationQueues": false,
    "nodeLabels": [
    "*"
    ],
    "numActiveApplications": 0,
    "numPendingApplications": 0,
    "numContainers": 0,
    "maxApplications": 10000,
    "maxApplicationsPerUser": 10000,
    "userLimit": 100,
    "users": null,
    "userLimitFactor": 1,
    "aMResourceLimit": {
    "memory": 4096,
    "vCores": 1
    },
    "userAMResourceLimit": {
    "memory": 4096,
    "vCores": 1
    }
    }
    ]
    }
    }
    }
    }

    从返回的json可以看到,使用的capacityScheduler,其内有一个queues数组。

Cluster Applications API

  • With the Applications API, you can obtain a collection of resources, each of which represents an application. When you run a GET operation on this resource, you obtain a collection of Application Objects.
  • demo usage:
    String clusterAppUrl = "http://10.3.242.99:8088/ws/v1/cluster/apps";
    YarnRestAPI.httpGet(clusterAppUrl);

   return json:
    

{
"apps": {
"app": [
{
"id": "application_1497756603270_0002",
"user": "root",
"name": "word count",
"queue": "default",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100,
"trackingUI": "History",
"trackingUrl": "http://host99:8088/proxy/application_1497756603270_0002/",
"diagnostics": "",
"clusterId": 1497756603270,
"applicationType": "MAPREDUCE",
"applicationTags": "",
"startedTime": 1497784937984,
"finishedTime": 1497784954421,
"elapsedTime": 16437,
"amContainerLogs": "http://host99:8042/node/containerlogs/container_1497756603270_0002_01_000001/root",
"amHostHttpAddress": "host99:8042",
"allocatedMB": 2048,
"allocatedVCores": 1,
"runningContainers": 1,
"memorySeconds": 239770,
"vcoreSeconds": 29,
"preemptedResourceMB": 0,
"preemptedResourceVCores": 0,
"numNonAMContainerPreempted": 0,
"numAMContainerPreempted": 0
},
{
"id": "application_1497756603270_0001",
"user": "root",
"name": "word count",
"queue": "default",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100,
"trackingUI": "History",
"trackingUrl": "http://host99:8088/proxy/application_1497756603270_0001/",
"diagnostics": "",
"clusterId": 1497756603270,
"applicationType": "MAPREDUCE",
"applicationTags": "",
"startedTime": 1497784895511,
"finishedTime": 1497784913807,
"elapsedTime": 18296,
"amContainerLogs": "http://host99:8042/node/containerlogs/container_1497756603270_0001_01_000001/root",
"amHostHttpAddress": "host99:8042",
"allocatedMB": -1,
"allocatedVCores": -1,
"runningContainers": -1,
"memorySeconds": 255182,
"vcoreSeconds": 34,
"preemptedResourceMB": 0,
"preemptedResourceVCores": 0,
"numNonAMContainerPreempted": 0,
"numAMContainerPreempted": 0
}
]
}
}

Query Parameters Supported

  • Multiple parameters can be specified for GET operations.
  • The started and finished times have a begin and end parameter to allow you to specify ranges. For example, you can request all applications that started between 1:00am and 2:00pm on 12/19/2011 with startedTimeBegin=1324256400&startedTimeEnd=1324303200.

Cluster Application Statistics API

  • With the Application Statistics API, you can obtain a collection of triples, each of which contains the application type, the appplication state and the number of applications of this type and this state in RM context.
  • demo usage:
    String appMetricsUrl = "http://10.3.242.99:8088/ws/v1/cluster/appstatistics";
    YarnRestAPI.httpGet(appMetricsUrl);

    return json:

    {
    "appStatInfo": {
    "statItem": [
    {
    "state": "RUNNING",
    "type": "*",
    "count": 0
    },
    {
    "state": "ACCEPTED",
    "type": "*",
    "count": 0
    },
    {
    "state": "NEW_SAVING",
    "type": "*",
    "count": 0
    },
    {
    "state": "NEW",
    "type": "*",
    "count": 0
    },
    {
    "state": "KILLED",
    "type": "*",
    "count": 0
    },
    {
    "state": "FINISHED",
    "type": "*",
    "count": 2
    },
    {
    "state": "FAILED",
    "type": "*",
    "count": 0
    },
    {
    "state": "SUBMITTED",
    "type": "*",
    "count": 0
    }
    ]
    }
    }
  • 可以支持的查询参数有:states & applicationTypes

Cluster Application API

  • An application resource contains information about a particular application that was submitted to a cluster.
  • demo usage:
    String appUrl = "http://10.3.242.99:8088/ws/v1/cluster/apps/application_1497756603270_0002";
    YarnRestAPI.httpGet(appUrl);

    return json:

    {
    "app": {
    "id": "application_1497756603270_0002",
    "user": "root",
    "name": "word count",
    "queue": "default",
    "state": "FINISHED",
    "finalStatus": "SUCCEEDED",
    "progress": 100,
    "trackingUI": "History",
    "trackingUrl": "http://host99:8088/proxy/application_1497756603270_0002/",
    "diagnostics": "",
    "clusterId": 1497756603270,
    "applicationType": "MAPREDUCE",
    "applicationTags": "",
    "startedTime": 1497784937984,
    "finishedTime": 1497784954421,
    "elapsedTime": 16437,
    "amContainerLogs": "http://host99:8042/node/containerlogs/container_1497756603270_0002_01_000001/root",
    "amHostHttpAddress": "host99:8042",
    "allocatedMB": -1,
    "allocatedVCores": -1,
    "runningContainers": -1,
    "memorySeconds": 241431,
    "vcoreSeconds": 30,
    "preemptedResourceMB": 0,
    "preemptedResourceVCores": 0,
    "numNonAMContainerPreempted": 0,
    "numAMContainerPreempted": 0
    }
    }

Cluster Application Attempts API

  • TBD...

FYI

<YaRN><Official doc><RM REST API's>的更多相关文章

  1. 简单物联网:外网访问内网路由器下树莓派Flask服务器

    最近做一个小东西,大概过程就是想在教室,宿舍控制实验室的一些设备. 已经在树莓上搭了一个轻量的flask服务器,在实验室的路由器下,任何设备都是可以访问的:但是有一些限制条件,比如我想在宿舍控制我种花 ...

  2. 利用ssh反向代理以及autossh实现从外网连接内网服务器

    前言 最近遇到这样一个问题,我在实验室架设了一台服务器,给师弟或者小伙伴练习Linux用,然后平时在实验室这边直接连接是没有问题的,都是内网嘛.但是回到宿舍问题出来了,使用校园网的童鞋还是能连接上,使 ...

  3. 外网访问内网Docker容器

    外网访问内网Docker容器 本地安装了Docker容器,只能在局域网内访问,怎样从外网也能访问本地Docker容器? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Docker容器 ...

  4. 外网访问内网SpringBoot

    外网访问内网SpringBoot 本地安装了SpringBoot,只能在局域网内访问,怎样从外网也能访问本地SpringBoot? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装Java 1 ...

  5. 外网访问内网Elasticsearch WEB

    外网访问内网Elasticsearch WEB 本地安装了Elasticsearch,只能在局域网内访问其WEB,怎样从外网也能访问本地Elasticsearch? 本文将介绍具体的实现步骤. 1. ...

  6. 怎样从外网访问内网Rails

    外网访问内网Rails 本地安装了Rails,只能在局域网内访问,怎样从外网也能访问本地Rails? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Rails 默认安装的Rails端口 ...

  7. 怎样从外网访问内网Memcached数据库

    外网访问内网Memcached数据库 本地安装了Memcached数据库,只能在局域网内访问,怎样从外网也能访问本地Memcached数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装 ...

  8. 怎样从外网访问内网CouchDB数据库

    外网访问内网CouchDB数据库 本地安装了CouchDB数据库,只能在局域网内访问,怎样从外网也能访问本地CouchDB数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Cou ...

  9. 怎样从外网访问内网DB2数据库

    外网访问内网DB2数据库 本地安装了DB2数据库,只能在局域网内访问,怎样从外网也能访问本地DB2数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动DB2数据库 默认安装的DB2 ...

  10. 怎样从外网访问内网OpenLDAP数据库

    外网访问内网OpenLDAP数据库 本地安装了OpenLDAP数据库,只能在局域网内访问,怎样从外网也能访问本地OpenLDAP数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动 ...

随机推荐

  1. flex自定义preloader预加载进度条

    flex默认的preloader已经很不错了,可是有时候还是需要自定义的.   需要在要出现自定义预加载的程序的<mx:Application>标签里加入preloader="& ...

  2. mybatis*中DefaultVFS的logger乱码问题

    从网上下的Java Persistence with MyBatis 3的源码 出现这个问题的原因是logback记日志的时候乱码 ResolverUtil - Not a JAR: file:... ...

  3. 解决PLSQL Developer 插入中文 乱码问题(转)

    原文地址:解决PLSQL Developer 插入中文 乱码问题 PLSQL Developer 插入中文 乱码问题,如图     这个是由于oracle服务器端字符编码 和 Oracle 客户端 字 ...

  4. python中RabbitMQ的使用(路由键模糊匹配)

    路由键模糊匹配 使用正则表达式进行匹配.其中“#”表示所有.全部的意思:“*”只匹配到一个词. 匹配规则: 路由键:routings = [ 'happy.work',  'happy.life' , ...

  5. [luogu P3216] [HNOI2011]数学作业

    [luogu P3216] [HNOI2011]数学作业 题目描述 小 C 数学成绩优异,于是老师给小 C 留了一道非常难的数学作业题: 给定正整数 N 和 M,要求计算 Concatenate (1 ...

  6. [NOIP 2014TG D1T3] 飞扬的小鸟

    题目描述 Flappy Bird 是一款风靡一时的休闲手机游戏.玩家需要不断控制点击手机屏幕的频率来调节小鸟的飞行高度,让小鸟顺利通过画面右方的管道缝隙.如果小鸟一不小心撞到了水管或者掉在地上的话,便 ...

  7. 第2天【OS Linux发行版介绍、Linux系统基础使用入门、Linux命令帮助、Linux基础命令】

    Logout    退出系统 Gedit     文本编辑器工具 Uname –r 查看内核版本信息,uname –a 比较详细 Cat /proc/cpuinfo      查看CPU Cat /p ...

  8. vsftp的安装与配置

    1.安装 直接使用yum安装,如果没有网络在其他机器使用yum先离线下载即可,vsftpd一般就自己不需要装其他依赖包 rpm -qa|grep vsftpd #查看是否安装 yum install ...

  9. 在Windows系统下搭建ELK日志分析平台

    简介: ELK由ElasticSearch.Logstash和Kiabana三个开源工具组成: Elasticsearch是个开源分布式搜索引擎,它的特点有:分布式,零配置,自动发现,索引自动分片,索 ...

  10. js问题 项目问题

    项目问题1. js 中字符串替换函数var a = 'fajlfjal'a.replace(/b/,'a') // 不能直接改变a 而是返回替换后的值 2. vue 属性绑定中 class style ...