四：ResourceManger Restart

概述：

RM是yarn中最重要的组件。但是只有一个RM，因此存在单点失败的问题。RM的重启有两种方式：

1.(Non-work-preserving RM restart) 不保留工作状态的重启

这种情况下，RM把应用(application)的状态保存在一个插件化的state-store里，等RM重启后，RM重新加载这些状态，然后kick之前正在执行的任务，用户不必重新提交任务。

2.(work-preserving RM restart)保留工作状态的重启

RM通过合并NM上的container状态和AM的container请求来重新任务状态。上面的情况不同的是，不需要kill之前正在执行的任务，任务在RM重启的时候可以继续执行。

特性：

Non-work-preserving RM restart：

这种方式下，RM会在client提交工作时保存应用(application)的元数据(如ApplicationSubmissionContext)到插件化的state-store中，并且在任务执行完成后保存执行状态。此外，RM还保存应用的凭证信息(security keys、tokens)等。当RM宕机后，RM可会重新加载这些保存在state-store中的元数据，并且重新提交任务(不提交RM宕机前已经执行完成的)。

nodeManagers和clients在RM宕机期间会轮询RM。RM重启后，会通过心跳(heartbeats)发出一个re-sync命令到所有的NM和AM上.NMs接收到re-sync命令后，会把自己节点上的所有containers都干掉，然后重新注册到RM(跟新的RM一样)。AMs接收到re-sync命令后，会shutdown。RM加载完元数据信息后，会为任务重建AM。在NMs和AMs接收到re-sync命令后，RM宕机时正在执行的任务就被kill掉了。

保存元数据->RM宕机->RM重启->发送re-sync到NMs->NMs kill containers，AMs shutdown->RM读取无数据->RM提交任务->RM 分配AM

Work-preserving RM restart：

RM重建YARN集群状态，最重要的是重建scheduler的的状态，包括(containers’ life-cycle, applications’ headroom and resource requests, queues’ resource usage)containers的生命周期、应用程序的headroot、资源请求、队列的资源使用情况等。RM不用杀死正在执行的程序，在RM重启后，会继续这些暂停的任务。

RM重新通过NMs发送的containers状态来重建集群。在RM宕机期间，NMs不会kill containers，并且继续维护containers的状态，在RM重启后，NMs会向RM重新注册，并发containers的状态。之后,AM需要重新发送后续的资源请求，因为在RM在宕机前可能就没有满足AM的资源请求。应用程序使用AMRMClient和RM通信而不必担心AM在RM re-synce时重新向RM请求资源。

注意：无论哪种方式，都需要一个state-store

配置：

四种state-store方式：zookeeper\hdfs\local file\db；其中zookeeper支持RM HA的恢复，其它不支持HA。

参考：http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html

Enable RM Restart

Property	Description
`yarn.resourcemanager.recovery.enabled`	`true`

Configure the state-store for persisting the RM state

Property Description

yarn.resourcemanager.store.class The class name of the state-store to be used for saving application/attempt state and the credentials. The available state-store implementations areorg.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore, a ZooKeeper based state-store implementation andorg.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore, a Hadoop FileSystem based state-store implementation like HDFS and local FS. org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore, a LevelDB based state-store implementation. The default value is set to org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.

Property	Description
`yarn.resourcemanager.store.class`	The class name of the state-store to be used for saving application/attempt state and the credentials. The available state-store implementations are`org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore`, a ZooKeeper based state-store implementation and`org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore`, a Hadoop FileSystem based state-store implementation like HDFS and local FS. `org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore`, a LevelDB based state-store implementation. The default value is set to `org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore`.

How to choose the state-store implementation

ZooKeeper based state-store: User is free to pick up any storage to set up RM restart, but must use ZooKeeper based state-store to support RM HA. The reason is that only ZooKeeper based state-store supports fencing mechanism to avoid a split-brain situation where multiple RMs assume they are active and can edit the state-store at the same time.
FileSystem based state-store: HDFS and local FS based state-store are supported. Fencing mechanism is not supported.
LevelDB based state-store: LevelDB based state-store is considered more light weight than HDFS and ZooKeeper based state-store. LevelDB supports better atomic operations, fewer I/O ops per state update, and far fewer total files on the filesystem. Fencing mechanism is not supported.

来自为知笔记(Wiz)