hadoop-3
结合https://blog.csdn.net/zhangjun5965/article/details/76796998,自己过一遍感受下
public class DFSZKFailoverController extends ZKFailoverController 模板调用
public static void main(String args[])
throws Exception {
StringUtils.startupShutdownMessage(DFSZKFailoverController.class,
args, LOG);
if (DFSUtil.parseHelpArgument(args,
ZKFailoverController.USAGE, System.out, true)) {
System.exit(0);
} GenericOptionsParser parser = new GenericOptionsParser(
new HdfsConfiguration(), args);
DFSZKFailoverController zkfc = DFSZKFailoverController.create(
parser.getConfiguration());
try {
System.exit(zkfc.run(parser.getRemainingArgs()));
} catch (Throwable t) {
LOG.fatal("DFSZKFailOverController exiting due to earlier exception "
+ t);
terminate(1, t);
}
}
抽象类中run方法
public int run(final String[] args) throws Exception {
if (!localTarget.isAutoFailoverEnabled()) {
LOG.error("Automatic failover is not enabled for " + localTarget + "." +
" Please ensure that automatic failover is enabled in the " +
"configuration before running the ZK failover controller.");
return ERR_CODE_AUTO_FAILOVER_NOT_ENABLED;
}
loginAsFCUser();
try {
return SecurityUtil.doAsLoginUserOrFatal(new PrivilegedAction<Integer>() {
@Override
public Integer run() {
try {
return doRun(args);
} catch (Exception t) {
throw new RuntimeException(t);
} finally {
if (elector != null) {
elector.terminateConnection();
}
}
}
});
} catch (RuntimeException rte) {
throw (Exception)rte.getCause();
}
}
下面这个注释说明了选举的机制,就是利用创建zk临时节点机制, atomically creating an ephemeral lock file (znode) onZookeeper. The service instance that successfully creates the znode becomesactive and the rest become standbys,之中还定义了回调接口,包含当节点变为主节点或者从节点的通知,这里注意,存在一个防止脑裂的enterNeutralMode,在状态不确定的情况下通知到节点
public class ActiveStandbyElector implements StatCallback, StringCallback {
/**
* Callback interface to interact with the ActiveStandbyElector object. <br/>
* The application will be notified with a callback only on state changes
* (i.e. there will never be successive calls to becomeActive without an
* intermediate call to enterNeutralMode). <br/>
* The callbacks will be running on Zookeeper client library threads. The
* application should return from these callbacks quickly so as not to impede
* Zookeeper client library performance and notifications. The app will
* typically remember the state change and return from the callback. It will
* then proceed with implementing actions around that state change. It is
* possible to be called back again while these actions are in flight and the
* app should handle this scenario.
*/
public interface ActiveStandbyElectorCallback {
/**
* This method is called when the app becomes the active leader.
* If the service fails to become active, it should throw
* ServiceFailedException. This will cause the elector to
* sleep for a short period, then re-join the election.
*
* Callback implementations are expected to manage their own
* timeouts (e.g. when making an RPC to a remote node).
*/
void becomeActive() throws ServiceFailedException;
/**
* This method is called when the app becomes a standby
*/
void becomeStandby();
/**
* If the elector gets disconnected from Zookeeper and does not know about
* the lock state, then it will notify the service via the enterNeutralMode
* interface. The service may choose to ignore this or stop doing state
* changing operations. Upon reconnection, the elector verifies the leader
* status and calls back on the becomeActive and becomeStandby app
* interfaces. <br/>
* Zookeeper disconnects can happen due to network issues or loss of
* Zookeeper quorum. Thus enterNeutralMode can be used to guard against
* split-brain issues. In such situations it might be prudent to call
* becomeStandby too. However, such state change operations might be
* expensive and enterNeutralMode can help guard against doing that for
* transient issues.
*/
void enterNeutralMode();
void notifyFatalError(String errorMessage);
void fenceOldActive(byte[] oldActiveData);
}
定义了一个watcher,监听zk相关事件
/**
* Watcher implementation which keeps a reference around to the
* original ZK connection, and passes it back along with any
* events.
*/
private final class WatcherWithClientRef implements Watcher {
private ZooKeeper zk; /**
* Latch fired whenever any event arrives. This is used in order
* to wait for the Connected event when the client is first created.
*/
private CountDownLatch hasReceivedEvent = new CountDownLatch(1); /**
* Latch used to wait until the reference to ZooKeeper is set.
*/
private CountDownLatch hasSetZooKeeper = new CountDownLatch(1); /**
* Waits for the next event from ZooKeeper to arrive.
*
* @param connectionTimeoutMs zookeeper connection timeout in milliseconds
* @throws KeeperException if the connection attempt times out. This will
* be a ZooKeeper ConnectionLoss exception code.
* @throws IOException if interrupted while connecting to ZooKeeper
*/
private void waitForZKConnectionEvent(int connectionTimeoutMs)
throws KeeperException, IOException {
try {
if (!hasReceivedEvent.await(connectionTimeoutMs, TimeUnit.MILLISECONDS)) {
LOG.error("Connection timed out: couldn't connect to ZooKeeper in " +
"{} milliseconds", connectionTimeoutMs);
zk.close();
throw KeeperException.create(Code.CONNECTIONLOSS);
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IOException(
"Interrupted when connecting to zookeeper server", e);
}
} private void setZooKeeperRef(ZooKeeper zk) {
Preconditions.checkState(this.zk == null,
"zk already set -- must be set exactly once");
this.zk = zk;
hasSetZooKeeper.countDown();
} @Override
public void process(WatchedEvent event) {
hasReceivedEvent.countDown();
try {
if (!hasSetZooKeeper.await(zkSessionTimeout, TimeUnit.MILLISECONDS)) {
LOG.debug("Event received with stale zk");
}
ActiveStandbyElector.this.processWatchEvent(
zk, event);
} catch (Throwable t) {
fatalError(
"Failed to process watcher event " + event + ": " +
StringUtils.stringifyException(t));
}
}
}
具体处理逻辑
synchronized void processWatchEvent(ZooKeeper zk, WatchedEvent event) {
Event.EventType eventType = event.getType();
if (isStaleClient(zk)) return;
if (LOG.isDebugEnabled()) {
LOG.debug("Watcher event type: " + eventType + " with state:"
+ event.getState() + " for path:" + event.getPath()
+ " connectionState: " + zkConnectionState
+ " for " + this);
}
if (eventType == Event.EventType.None) {
// the connection state has changed
switch (event.getState()) {
case SyncConnected:
LOG.info("Session connected.");
// if the listener was asked to move to safe state then it needs to
// be undone
ConnectionState prevConnectionState = zkConnectionState;
zkConnectionState = ConnectionState.CONNECTED;
if (prevConnectionState == ConnectionState.DISCONNECTED &&
wantToBeInElection) {
monitorActiveStatus();
}
break;
case Disconnected:
LOG.info("Session disconnected. Entering neutral mode...");
// ask the app to move to safe state because zookeeper connection
// is not active and we dont know our state
zkConnectionState = ConnectionState.DISCONNECTED;
enterNeutralMode();
break;
case Expired:
// the connection got terminated because of session timeout
// call listener to reconnect
LOG.info("Session expired. Entering neutral mode and rejoining...");
enterNeutralMode();
reJoinElection(0);
break;
case SaslAuthenticated:
LOG.info("Successfully authenticated to ZooKeeper using SASL.");
break;
default:
fatalError("Unexpected Zookeeper watch event state: "
+ event.getState());
break;
}
return;
}
// a watch on lock path in zookeeper has fired. so something has changed on
// the lock. ideally we should check that the path is the same as the lock
// path but trusting zookeeper for now
String path = event.getPath();
if (path != null) {
switch (eventType) {
case NodeDeleted:
if (state == State.ACTIVE) {
enterNeutralMode();
}
joinElectionInternal();
break;
case NodeDataChanged:
monitorActiveStatus();
break;
default:
if (LOG.isDebugEnabled()) {
LOG.debug("Unexpected node event: " + eventType + " for path: " + path);
}
monitorActiveStatus();
}
return;
}
// some unexpected error has occurred
fatalError("Unexpected watch error from Zookeeper");
}
看下加入选举,起始非常简单,就是创建节点
private void joinElectionInternal() {
Preconditions.checkState(appData != null,
"trying to join election without any app data");
if (zkClient == null) {
if (!reEstablishSession()) {
fatalError("Failed to reEstablish connection with ZooKeeper");
return;
}
}
createRetryCount = 0;
wantToBeInElection = true;
createLockNodeAsync();
}
private void createLockNodeAsync() {
zkClient.create(zkLockFilePath, appData, zkAcl, CreateMode.EPHEMERAL,
this, zkClient);
}
一个变为不确定状态的代码
@Override
public void enterNeutralMode() {
LOG.warn("Lost contact with Zookeeper. Transitioning to standby in "
+ zkSessionTimeout + " ms if connection is not reestablished."); // If we've just become disconnected, start a timer. When the time's up,
// we'll transition to standby.
synchronized (zkDisconnectLock) {
if (zkDisconnectTimer == null) {
zkDisconnectTimer = new Timer("Zookeeper disconnect timer");
zkDisconnectTimer.schedule(new TimerTask() {
@Override
public void run() {
synchronized (zkDisconnectLock) {
// Only run if the timer hasn't been cancelled
if (zkDisconnectTimer != null) {
becomeStandby();
}
}
}
}, zkSessionTimeout);
}
}
}
@Override
public void becomeStandby() {
cancelDisconnectTimer(); try {
rm.getRMContext().getRMAdminService().transitionToStandby(req);
} catch (Exception e) {
LOG.error("RM could not transition to Standby", e);
}
}
线索非常清晰
hadoop-3的更多相关文章
- Hadoop 中利用 mapreduce 读写 mysql 数据
Hadoop 中利用 mapreduce 读写 mysql 数据 有时候我们在项目中会遇到输入结果集很大,但是输出结果很小,比如一些 pv.uv 数据,然后为了实时查询的需求,或者一些 OLAP ...
- 初识Hadoop、Hive
2016.10.13 20:28 很久没有写随笔了,自打小宝出生后就没有写过新的文章.数次来到博客园,想开始新的学习历程,总是被各种琐事中断.一方面确实是最近的项目工作比较忙,各个集群频繁地上线加多版 ...
- hadoop 2.7.3本地环境运行官方wordcount-基于HDFS
接上篇<hadoop 2.7.3本地环境运行官方wordcount>.继续在本地模式下测试,本次使用hdfs. 2 本地模式使用fs计数wodcount 上面是直接使用的是linux的文件 ...
- hadoop 2.7.3本地环境运行官方wordcount
hadoop 2.7.3本地环境运行官方wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次先以独立模式(本地模式 ...
- 【Big Data】HADOOP集群的配置(一)
Hadoop集群的配置(一) 摘要: hadoop集群配置系列文档,是笔者在实验室真机环境实验后整理而得.以便随后工作所需,做以知识整理,另则与博客园朋友分享实验成果,因为笔者在学习初期,也遇到不少问 ...
- Hadoop学习之旅二:HDFS
本文基于Hadoop1.X 概述 分布式文件系统主要用来解决如下几个问题: 读写大文件 加速运算 对于某些体积巨大的文件,比如其大小超过了计算机文件系统所能存放的最大限制或者是其大小甚至超过了计算机整 ...
- 程序员必须要知道的Hadoop的一些事实
程序员必须要知道的Hadoop的一些事实.现如今,Apache Hadoop已经无人不知无人不晓.当年雅虎搜索工程师Doug Cutting开发出这个用以创建分布式计算机环境的开源软...... 1: ...
- Hadoop 2.x 生态系统及技术架构图
一.负责收集数据的工具:Sqoop(关系型数据导入Hadoop)Flume(日志数据导入Hadoop,支持数据源广泛)Kafka(支持数据源有限,但吞吐大) 二.负责存储数据的工具:HBaseMong ...
- Hadoop的安装与设置(1)
在Ubuntu下安装与设置Hadoop的主要过程. 1. 创建Hadoop用户 创建一个用户,用户名为hadoop,在home下创建该用户的主目录,就不详细介绍了. 2. 安装Java环境 下载Lin ...
- 基于Ubuntu Hadoop的群集搭建Hive
Hive是Hadoop生态中的一个重要组成部分,主要用于数据仓库.前面的文章中我们已经搭建好了Hadoop的群集,下面我们在这个群集上再搭建Hive的群集. 1.安装MySQL 1.1安装MySQL ...
随机推荐
- 廖雪峰Java2-2数据封装-2构造方法
在2-2-1中,创建1个实例需要3步 Person ming = new Person(); ming.setName(" 小明 "); ming.setAge(16); 问题:能 ...
- [UE4]旋转小地图
一.Canvas Panel的旋转原点是Render Transform——>Pivot,Pivot坐标的取值范围是0到1,左上角的pivot坐标是[0,0],右下角的pivot坐标是[1,1] ...
- php安装imagemagick扩展 常见问题与解决方案(win平台)
1.写在前面 1-1.ImageMagick介绍 ImageMagick是一套功能强大.稳定而且开源的工具集和开发包,可以用来读.写和处理超过89种基本格式的图片文件,包括流行的TIFF.JPEG.G ...
- xdcms_3.0.1 | 代码审计
这周的审计任务,这次审计 xdcms . 下面就开始审计之旅. ...
- echart line 初始化隐藏legend
echart line,当line很多,且各line的取值区间相关非常大时,多条line同时显示,其实是没有太大的可读性的,因此需要在初始化时,把部分不太重要的legend隐藏起来. 具体做法如下: ...
- 猴哥来了-游戏开发记录17-微信排行榜bug
上线后排行榜bug 1.排序算法 const dataSorter = (gameDatas, field = Consts.OpenDataKeys.LevelKey) => { let d ...
- python学习笔记之二
1.python计算运行时间 方法1 import datetime starttime = datetime.datetime.now() #long running endtime = datet ...
- 配置Jsp错误页面
配置Jsp错误页面一般我们有2种做法: (1)在页面中用指令进行配置,即page指令的errorPage和isErrorPage:可以使用page指令的errorPage来指定错误页!在当前JSP页面 ...
- Solr中使用游标进行深度分页查询以提高效率(适用的场景下)
通常,我们的应用系统,如果要做一次全量数据的读取,大多数时候,采用的方式会是使用分页读取的方式,然而 分页读取的方式,在大数据量的情况下,在solr里面表现并不是特别好,因为它随时可能会发生OOM的异 ...
- IE 主页被恶意篡改的解决方法
IE 主页被篡改了,在ie 的 主页设置中不起任何作用,这个时候,就要打开注册表来修改: 具体操作如下: 1.运行 regedit 打开注册表 2.找到 HKEY_LOCAL_MACHINE\SOF ...