yarn常用rest api

1 metrics

# curl http://localhost:8088/ws/v1/cluster/metrics

The cluster metrics resource provides some overall metrics about the cluster. More detailed metrics should be retrieved from the jmx interface.

{

  "clusterMetrics":

  {

    "appsSubmitted":0,

    "appsCompleted":0,

    "appsPending":0,

    "appsRunning":0,

    "appsFailed":0,

    "appsKilled":0,

    "reservedMB":0,

    "availableMB":17408,

    "allocatedMB":0,

    "reservedVirtualCores":0,

    "availableVirtualCores":7,

    "allocatedVirtualCores":1,

    "containersAllocated":0,

    "containersReserved":0,

    "containersPending":0,

    "totalMB":17408,

    "totalVirtualCores":8,

    "totalNodes":1,

    "lostNodes":0,

    "unhealthyNodes":0,

    "decommissionedNodes":0,

    "rebootedNodes":0,

    "activeNodes":1

  }

}

2 scheduler

# curl http://localhost:8088/ws/v1/cluster/scheduler

A scheduler resource contains information about the current scheduler configured in a cluster. It currently supports both the Fifo and Capacity Scheduler. You will get different information depending on which scheduler is configured so be sure to look at the type information.

{

    "scheduler": {

        "schedulerInfo": {

            "capacity": 100.0,

            "maxCapacity": 100.0,

            "queueName": "root",

            "queues": {

                "queue": [

                    {

                        "absoluteCapacity": 10.5,

                        "absoluteMaxCapacity": 50.0,

                        "absoluteUsedCapacity": 0.0,

                        "capacity": 10.5,

                        "maxCapacity": 50.0,

                        "numApplications": 0,

                        "queueName": "a",

                        "queues": {

                            "queue": [

                                {

                                    "absoluteCapacity": 3.15,

                                    "absoluteMaxCapacity": 25.0,

                                    "absoluteUsedCapacity": 0.0,

                                    "capacity": 30.000002,

                                    "maxCapacity": 50.0,

                                    "numApplications": 0,

                                    "queueName": "a1",

...

3 apps

# curl http://localhost:8088/ws/v1/cluster/apps

With the Applications API, you can obtain a collection of resources, each of which represents an application. When you run a GET operation on this resource, you obtain a collection of Application Objects.

支持参数：

* state [deprecated] - state of the application
* states - applications matching the given application states, specified as a comma-separated list.
* finalStatus - the final status of the application - reported by the application itself
* user - user name
* queue - queue name
* limit - total number of app objects to be returned
* startedTimeBegin - applications with start time beginning with this time, specified in ms since epoch
* startedTimeEnd - applications with start time ending with this time, specified in ms since epoch
* finishedTimeBegin - applications with finish time beginning with this time, specified in ms since epoch
* finishedTimeEnd - applications with finish time ending with this time, specified in ms since epoch
* applicationTypes - applications matching the given application types, specified as a comma-separated list.
* applicationTags - applications matching any of the given application tags, specified as a comma-separated list.

{

  "apps":

  {

    "app":

    [

       {

          "finishedTime" : 1326815598530,

          "amContainerLogs" : "http://host.domain.com:8042/node/containerlogs/container_1326815542473_0001_01_000001",

          "trackingUI" : "History",

          "state" : "FINISHED",

          "user" : "user1",

          "id" : "application_1326815542473_0001",

          "clusterId" : 1326815542473,

          "finalStatus" : "SUCCEEDED",

          "amHostHttpAddress" : "host.domain.com:8042",

          "progress" : 100,

          "name" : "word count",

          "startedTime" : 1326815573334,

          "elapsedTime" : 25196,

          "diagnostics" : "",

          "trackingUrl" : "http://host.domain.com:8088/proxy/application_1326815542473_0001/jobhistory/job/job_1326815542473_1_1",

          "queue" : "default",

          "allocatedMB" : 0,

          "allocatedVCores" : 0,

          "runningContainers" : 0,

          "memorySeconds" : 151730,

          "vcoreSeconds" : 103

       },

       {

          "finishedTime" : 1326815789546,

          "amContainerLogs" : "http://host.domain.com:8042/node/containerlogs/container_1326815542473_0002_01_000001",

          "trackingUI" : "History",

          "state" : "FINISHED",

          "user" : "user1",

          "id" : "application_1326815542473_0002",

          "clusterId" : 1326815542473,

          "finalStatus" : "SUCCEEDED",

          "amHostHttpAddress" : "host.domain.com:8042",

          "progress" : 100,

          "name" : "Sleep job",

          "startedTime" : 1326815641380,

          "elapsedTime" : 148166,

          "diagnostics" : "",

          "trackingUrl" : "http://host.domain.com:8088/proxy/application_1326815542473_0002/jobhistory/job/job_1326815542473_2_2",

          "queue" : "default",

          "allocatedMB" : 0,

          "allocatedVCores" : 0,

          "runningContainers" : 1,

          "memorySeconds" : 640064,

          "vcoreSeconds" : 442

       }

    ]

  }

}

收集shell脚本示例

metrics

#!/bin/sh

cluster_name="c1"

rms="192.168.0.1 192.168.0.2"

url_path="/ws/v1/cluster/metrics"

keyword="clusterMetrics"

log_name="metrics.log"

base_dir="/tmp"

log_path=${base_dir}/${log_name}

echo "`date +'%Y-%m-%d %H:%M:%S'`"

for rm in $rms

do

        url="http://${rm}:8088${url_path}"

        echo $url

        content=`curl $url`

        echo $content

        if [[ "$content" == *"$keyword"* ]]; then

                break

        fi

done

if [[ "$content" == *"$keyword"* ]]; then

        modified="${content:0:$((${#content}-1))},\"currentTime\":`date +%s`,\"clusterName\":\"${cluster_name}\"}"

        echo "$modified"

        echo "$modified" >> $log_path

else

        echo "gather metrics failed from : ${rms}, ${url_path}, ${keyword}"

fi

apps

#!/bin/sh

cluster_name="c1"

rms="192.168.0.1 192.168.0.2"

url_path="/ws/v1/cluster/apps?states=RUNNING"

keyword="apps"

log_name="apps.log"

base_dir="/tmp"

log_path=${base_dir}/${log_name}

echo "`date +'%Y-%m-%d %H:%M:%S'`"

for rm in $rms

do

        url="http://${rm}:8088${url_path}"

        echo $url

        content=`curl $url`

        echo $content

        if [[ "$content" == *"$keyword"* ]]; then

                break

        fi

done

if [[ "$content" == *"$keyword"* ]]; then

        if [[ "$content" == *"application_"* ]]; then

                postfix=",\"currentTime\":`date +%s`,\"clusterName\":\"${cluster_name}\"}"

                modified="${content:16:$((${#content}-20))}"

                echo "${modified//\"/\\\"}"|awk '{split($0,arr,"},"); for (i in arr) {print arr[i]}}'|xargs -i echo "{}$postfix" >> $log_path

        else

                echo "no apps is running"

        fi

else

        echo "gather metrics failed from : ${rms}, ${url_path}, ${keyword}"

fi

然后对接ELK

ELK

Logstash配置示例

metrics1：input json+filter mutate rename

input {

  file {

    path => "/tmp/metrics.log"

    codec => "json"

  }

}

filter {

  mutate {

    rename => {

      "[clusterMetrics][appsSubmitted]" => "[appsSubmitted]"

      "[clusterMetrics][appsCompleted]" => "[appsCompleted]"

      "[clusterMetrics][appsPending]" => "[appsPending]"

      "[clusterMetrics][appsRunning]" => "[appsRunning]"

      "[clusterMetrics][appsFailed]" => "[appsFailed]"

      "[clusterMetrics][appsKilled]" => "[appsKilled]"

      "[clusterMetrics][reservedMB]" => "[reservedMB]"

      "[clusterMetrics][availableMB]" => "[availableMB]"

      "[clusterMetrics][allocatedMB]" => "[allocatedMB]"

      "[clusterMetrics][reservedVirtualCores]" => "[reservedVirtualCores]"

      "[clusterMetrics][availableVirtualCores]" => "[availableVirtualCores]"

      "[clusterMetrics][allocatedVirtualCores]" => "[allocatedVirtualCores]"

      "[clusterMetrics][containersAllocated]" => "[containersAllocated]"

      "[clusterMetrics][containersReserved]" => "[containersReserved]"

      "[clusterMetrics][containersPending]" => "[containersPending]"

      "[clusterMetrics][totalMB]" => "[totalMB]"

      "[clusterMetrics][totalVirtualCores]" => "[totalVirtualCores]"

      "[clusterMetrics][totalNodes]" => "[totalNodes]"

      "[clusterMetrics][lostNodes]" => "[lostNodes]"

      "[clusterMetrics][unhealthyNodes]" => "[unhealthyNodes]"

      "[clusterMetrics][decommissionedNodes]" => "[decommissionedNodes]"

      "[clusterMetrics][rebootedNodes]" => "[rebootedNodes]"

      "[clusterMetrics][activeNodes]" => "[activeNodes]"

    }

    remove_field => ["clusterMetrics", "path"]

  }

#  ruby {

#    code => "event.set('@timestamp', LogStash::Timestamp.at(event.get('currentTime') + 28800))"

#  }

  date {

    match => [ "currentTime","UNIX"]

    target => "@timestamp"

  }

}

metrics2：filter json+filter mutate add_field

input {

  file {

    path => "/tmp/metrics.log"

  }

}

filter {

  json {

    source => "message"

  }

  mutate {

    add_field => {

      "appsSubmitted" => "%{[clusterMetrics][appsSubmitted]}"

      "appsCompleted" => "%{[clusterMetrics][appsCompleted]}"

      "appsPending" => "%{[clusterMetrics][appsPending]}"

      "appsRunning" => "%{[clusterMetrics][appsRunning]}"

      "appsFailed" => "%{[clusterMetrics][appsFailed]}"

      "appsKilled" => "%{[clusterMetrics][appsKilled]}"

      "reservedMB" => "%{[clusterMetrics][reservedMB]}"

      "availableMB" => "%{[clusterMetrics][availableMB]}"

      "allocatedMB" => "%{[clusterMetrics][allocatedMB]}"

      "reservedVirtualCores" => "%{[clusterMetrics][reservedVirtualCores]}"

      "availableVirtualCores" => "%{[clusterMetrics][availableVirtualCores]}"

      "allocatedVirtualCores" => "%{[clusterMetrics][allocatedVirtualCores]}"

      "containersAllocated" => "%{[clusterMetrics][containersAllocated]}"

      "containersReserved" => "%{[clusterMetrics][containersReserved]}"

      "containersPending" => "%{[clusterMetrics][containersPending]}"

      "totalMB" => "%{[clusterMetrics][totalMB]}"

      "totalVirtualCores" => "%{[clusterMetrics][totalVirtualCores]}"

      "totalNodes" => "%{[clusterMetrics][totalNodes]}"

      "lostNodes" => "%{[clusterMetrics][lostNodes]}"

      "unhealthyNodes" => "%{[clusterMetrics][unhealthyNodes]}"

      "decommissionedNodes" => "%{[clusterMetrics][decommissionedNodes]}"

      "rebootedNodes" => "%{[clusterMetrics][rebootedNodes]}"

      "activeNodes" => "%{[clusterMetrics][activeNodes]}"

    }

    convert => {

      "appsSubmitted" => "integer"

      "appsCompleted" => "integer"

      "appsPending" => "integer"

      "appsRunning" => "integer"

      "appsFailed" => "integer"

      "appsKilled" => "integer"

      "reservedMB" => "integer"

      "availableMB" => "integer"

      "allocatedMB" => "integer"

      "reservedVirtualCores" => "integer"

      "availableVirtualCores" => "integer"

      "allocatedVirtualCores" => "integer"

      "containersAllocated" => "integer"

      "containersReserved" => "integer"

      "containersPending" => "integer"

      "totalMB" => "integer"

      "totalVirtualCores" => "integer"

      "totalNodes" => "integer"

      "lostNodes" => "integer"

      "unhealthyNodes" => "integer"

      "decommissionedNodes" => "integer"

      "rebootedNodes" => "integer"

      "activeNodes" => "integer"

    }

    remove_field => ["message", "clusterMetrics", "path"]

  }

#  ruby {

#    code => "event.set('@timestamp', LogStash::Timestamp.at(event.get('currentTime') + 28800))"

#  }

  date {

    match => [ "currentTime","UNIX"]

    target => "@timestamp"

  }

}

app：input json

input {

  file {

    path => "/tmp/apps.log"

    codec => "json"

  }

}

filter {

#  ruby {

#    code => "event.set('@timestamp', LogStash::Timestamp.at(event.get('currentTime') + 28800))"

#  }

  date {

    match => [ "currentTime","UNIX"]

    target => "@timestamp"

  }

}

注意：

date插件得到的timestamp为UTC时区，

1）如果是存放到elasticsearch然后用kibana展示（kibana会自动根据浏览器时区做偏移），直接使用UTC时区就可以；

2）如果是存放到其他存储，想直接存储当前时区的时间，需要指定timezone，但是date插件使用UNIX格式时timezone不会生效，会使用系统默认时区，所以使用ruby插件转换时区；

Unix timestamps (i.e. seconds since the epoch) are by definition always UTC and @timestamp is also always UTC. The timezone option indicates the timezone of the source timestamp, but doesn't really apply when the UNIX or UNIX_MS patterns are used.

所有timezone：http://joda-time.sourceforge.net/timezones.html

Kibana展示示例

参考：

https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html

https://discuss.elastic.co/t/new-timestamp-using-dynamic-timezone-not-working/97166

【原创】大数据基础之Hadoop（3）yarn数据收集与监控的更多相关文章

【原创】大数据基础之Hadoop（2）hdfs和yarn最简绿色部署
环境:3结点集群 192.168.0.1192.168.0.2192.168.0.3 1 配置root用户服务期间免密登录参考:https://www.cnblogs.com/barneywill/ ...
【原创】大数据基础之Hadoop（1）HA实现原理
有些工作只能在一台server上进行,比如master,这时HA(High Availability)首先要求部署多个server,其次要求多个server自动选举出一个active状态server, ...
学习大数据基础框架hadoop需要什么基础
什么是大数据?进入本世纪以来,尤其是2010年之后,随着互联网特别是移动互联网的发展,数据的增长呈爆炸趋势,已经很难估计全世界的电子设备中存储的数据到底有多少,描述数据系统的数据量的计量单位从MB(1 ...
大数据基础总结---MapReduce和YARN技术原理
Map Reduce和YARN技术原理学习目标熟悉MapReduce和YARN是什么掌握MapReduce使用的场景及其原理掌握MapReduce和YARN功能与架构熟悉YARN的新特性 M ...
大数据架构师基础：hadoop家族，Cloudera产品系列等各种技术
大数据我们都知道hadoop,可是还会各种各样的技术进入我们的视野:Spark,Storm,impala,让我们都反映不过来.为了能够更好的架构大数据项目,这里整理一下,供技术人员,项目经理,架构师选 ...
【大数据】了解Hadoop框架的基础知识
介绍此Refcard提供了Apache Hadoop,这是最流行的软件框架,可使用简单的高级编程模型实现大型数据集的分布式存储和处理.我们将介绍Hadoop最重要的概念,描述其架构,指导您如何开始使 ...
大数据系列文章-Hadoop基础介绍（一）
Hadoop项目背景简介 2003-2004年,Google公开了部分GFS个Mapreduce思想的细节,以此为基础Doug Cutting等人用了2年的业余时间,实现了DFS和Mapreduce机 ...
【原创】大数据基础之Zookeeper（2）源代码解析
核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...
【原创】大数据基础之Impala（2）实现细节
一架构 Impala is a massively-parallel query execution engine, which runs on hundreds of machines in ex ...

随机推荐

Linux下redis的安装及配置
1.去官网下载redis(redis.io) 2.将其解压到根目录下 3.进入解压的目录,然后编译源程序, 如果不是root账户登录的,命令前面需要加sudo make make install PR ...
python部署galery集群
galery.py文件内容 import pexpect import os import configparser HOSTNAME_DB1='db1' HOSTNAME_DB2='db2' HOS ...
系统IO
系统IO:Linux系统提供给应用程序操作文件的接口 Everything is a file ,in Unix 在Unix/Linux下,万物皆文件打开文件函数原型: #include< ...
jquery动态设置图片路径和超链接href属性
js document.getElementById("myImage").src="hackanm.gif"; jquery $("#img&quo ...
[Alpha阶段]第一次Scrum Meeting
Scrum Meeting博客目录 [Alpha阶段]第一次Scrum Meeting 基本信息名称时间地点时长第一次Scrum Meeting 19/04/01 大运村寝室6楼 40min ...
二、IIS部署WebApi
一.项目发布二.hosts 更改 C:\Windows\System32\drivers\etc 三.网站搭建之后我将端口默认更改 8001 以防与80端口冲突注意: 1.先测试IIS的lo ...
Django+Vue打造购物网站(十)
首页.商品数量.缓存和限速功能开发将环境切换为本地,vue也切换为本地轮播图 goods/serializers.py class BannerSerializer(serializers.Mod ...
Java自定义异常类以及异常拦截器
自定义异常类不难,但下面这个方法,它的核心是异常拦截器类. 就算是在分布式系统间进行传递也可以,只要最顶层的服务有这个异常拦截器类(下例是在 springboot 项目中) 1.自定义异常类,继承自 ...
P1494 [国家集训队]小Z的袜子
题目 P1494 [国家集训队]小Z的袜子解析在区间$[l,r]$内, 任选两只袜子,有 \[r-l+1\choose2\] \[=\frac{(r-l+1)!}{2!(r-l-1)!}\] ...
elasticsearch更改mapping(不停服务重建索引)
转载地址:http://donlianli.iteye.com/blog/1924721?utm_source=tuicool&utm_medium=referral Elasticsearc ...

【原创】大数据基础之Hadoop（3）yarn数据收集与监控