flink checkpoint 源码分析 (一)
转发请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/8029356.html
checkpoint是Flink Fault Tolerance机制的重要构成部分,flink checkpoint的核心类名为org.apache.flink.runtime.checkpoint.CheckpointCoordinator。
定期产生的checkpoint事件
flink的checkpoint是由CheckpointCoordinator内部的一个timer线程池定时产生的,具体代码由ScheduledTrigger这个Runnable类启动。
private final class ScheduledTrigger implements Runnable {
@Override
public void run() {
try {
triggerCheckpoint(System.currentTimeMillis(), true);
}
catch (Exception e) {
LOG.error("Exception while triggering checkpoint.", e);
}
}
}
整个triggerCheckpoint方法大致分为三个部分:
1 环境前置检查
// Sanity check
if (props.externalizeCheckpoint() && targetDirectory == null) {
throw new IllegalStateException("No target directory specified to persist checkpoint to.");
} // make some eager pre-checks
synchronized (lock) {
// abort if the coordinator has been shutdown in the meantime
if (shutdown) {
return new CheckpointTriggerResult(CheckpointDeclineReason.COORDINATOR_SHUTDOWN);
} // Don't allow periodic checkpoint if scheduling has been disabled
if (isPeriodic && !periodicScheduling) {
return new CheckpointTriggerResult(CheckpointDeclineReason.PERIODIC_SCHEDULER_SHUTDOWN);
} // validate whether the checkpoint can be triggered, with respect to the limit of
// concurrent checkpoints, and the minimum time between checkpoints.
// these checks are not relevant for savepoints
if (!props.forceCheckpoint()) {
// sanity check: there should never be more than one trigger request queued
if (triggerRequestQueued) {
LOG.warn("Trying to trigger another checkpoint while one was queued already");
return new CheckpointTriggerResult(CheckpointDeclineReason.ALREADY_QUEUED);
} // if too many checkpoints are currently in progress, we need to mark that a request is queued
if (pendingCheckpoints.size() >= maxConcurrentCheckpointAttempts) {
triggerRequestQueued = true;
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
return new CheckpointTriggerResult(CheckpointDeclineReason.TOO_MANY_CONCURRENT_CHECKPOINTS);
} // make sure the minimum interval between checkpoints has passed
final long earliestNext = lastCheckpointCompletionNanos + minPauseBetweenCheckpointsNanos;
final long durationTillNextMillis = (earliestNext - System.nanoTime()) / 1_000_000; if (durationTillNextMillis > 0) {
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
// Reassign the new trigger to the currentPeriodicTrigger
currentPeriodicTrigger = timer.scheduleAtFixedRate(
new ScheduledTrigger(),
durationTillNextMillis, baseInterval, TimeUnit.MILLISECONDS); return new CheckpointTriggerResult(CheckpointDeclineReason.MINIMUM_TIME_BETWEEN_CHECKPOINTS);
}
}
} // check if all tasks that we need to trigger are running.
// if not, abort the checkpoint
Execution[] executions = new Execution[tasksToTrigger.length];
for (int i = 0; i < tasksToTrigger.length; i++) {
Execution ee = tasksToTrigger[i].getCurrentExecutionAttempt();
if (ee != null && ee.getState() == ExecutionState.RUNNING) {
executions[i] = ee;
} else {
LOG.info("Checkpoint triggering task {} is not being executed at the moment. Aborting checkpoint.",
tasksToTrigger[i].getTaskNameWithSubtaskIndex());
return new CheckpointTriggerResult(CheckpointDeclineReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
}
} // next, check if all tasks that need to acknowledge the checkpoint are running.
// if not, abort the checkpoint
Map<ExecutionAttemptID, ExecutionVertex> ackTasks = new HashMap<>(tasksToWaitFor.length); for (ExecutionVertex ev : tasksToWaitFor) {
Execution ee = ev.getCurrentExecutionAttempt();
if (ee != null) {
ackTasks.put(ee.getAttemptId(), ev);
} else {
LOG.info("Checkpoint acknowledging task {} is not being executed at the moment. Aborting checkpoint.",
ev.getTaskNameWithSubtaskIndex());
return new CheckpointTriggerResult(CheckpointDeclineReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
}
}
上面的代码主要在生成一个chepoint之前进行了一些pre-checks,包括checkpoint的targetDirectory、正在进行中的pendingCheckpoint数量上限、前后两次checkpoint间隔是否过小、以及下游与checkpoint相关tasks是否存活等检测,任意一个条件不满足的都不会执行真正的checkpoint动作。
2 生成pendingcheckpoint
final long checkpointID;
try {
// this must happen outside the coordinator-wide lock, because it communicates
// with external services (in HA mode) and may block for a while.
checkpointID = checkpointIdCounter.getAndIncrement();
}
catch (Throwable t) {
int numUnsuccessful = numUnsuccessfulCheckpointsTriggers.incrementAndGet();
LOG.warn("Failed to trigger checkpoint (" + numUnsuccessful + " consecutive failed attempts so far)", t);
return new CheckpointTriggerResult(CheckpointDeclineReason.EXCEPTION);
} final PendingCheckpoint checkpoint = new PendingCheckpoint(
job,
checkpointID,
timestamp,
ackTasks,
props,
targetDirectory,
executor); if (statsTracker != null) {
PendingCheckpointStats callback = statsTracker.reportPendingCheckpoint(
checkpointID,
timestamp,
props); checkpoint.setStatsCallback(callback);
} // schedule the timer that will clean up the expired checkpoints
final Runnable canceller = new Runnable() {
@Override
public void run() {
synchronized (lock) {
// only do the work if the checkpoint is not discarded anyways
// note that checkpoint completion discards the pending checkpoint object
if (!checkpoint.isDiscarded()) {
LOG.info("Checkpoint " + checkpointID + " expired before completing."); checkpoint.abortExpired();
pendingCheckpoints.remove(checkpointID);
rememberRecentCheckpointId(checkpointID); triggerQueuedRequests();
}
}
}
};
pendingcheckpoint表示一个待处理的检查点,每个pendingcheckpoint标有一个全局唯一的递增checkpointID,并声明了一个canceller用于后续超时情况下的checkpoint清理用于释放资源。
// re-acquire the coordinator-wide lock
synchronized (lock) {
// since we released the lock in the meantime, we need to re-check
// that the conditions still hold.
if (shutdown) {
return new CheckpointTriggerResult(CheckpointDeclineReason.COORDINATOR_SHUTDOWN);
}
else if (!props.forceCheckpoint()) {
if (triggerRequestQueued) {
LOG.warn("Trying to trigger another checkpoint while one was queued already");
return new CheckpointTriggerResult(CheckpointDeclineReason.ALREADY_QUEUED);
} if (pendingCheckpoints.size() >= maxConcurrentCheckpointAttempts) {
triggerRequestQueued = true;
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
return new CheckpointTriggerResult(CheckpointDeclineReason.TOO_MANY_CONCURRENT_CHECKPOINTS);
} // make sure the minimum interval between checkpoints has passed
final long earliestNext = lastCheckpointCompletionNanos + minPauseBetweenCheckpointsNanos;
final long durationTillNextMillis = (earliestNext - System.nanoTime()) / 1_000_000; if (durationTillNextMillis > 0) {
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
} // Reassign the new trigger to the currentPeriodicTrigger
currentPeriodicTrigger = timer.scheduleAtFixedRate(
new ScheduledTrigger(),
durationTillNextMillis, baseInterval, TimeUnit.MILLISECONDS); return new CheckpointTriggerResult(CheckpointDeclineReason.MINIMUM_TIME_BETWEEN_CHECKPOINTS);
}
} LOG.info("Triggering checkpoint " + checkpointID + " @ " + timestamp); pendingCheckpoints.put(checkpointID, checkpoint); ScheduledFuture<?> cancellerHandle = timer.schedule(
canceller,
checkpointTimeout, TimeUnit.MILLISECONDS); if (!checkpoint.setCancellerHandle(cancellerHandle)) {
// checkpoint is already disposed!
cancellerHandle.cancel(false);
}
pendingcheckpoint在正式执行前还会再执行一遍前置检查,主要等待完成的检查点数量是否过多以及前后两次完成的检查点间隔是否过短等问题,这些检查都通过后,会把之前定义好的cancller注册到timer线程池,如果等待时间过长会主动回收checkpoint的资源。
3 启动checkpoint执行
发送这个checkpoint的checkpointID和timestamp到各个对应的executor,也就是给各个TaskManger发一个TriggerCheckpoint类型的消息。
CheckpointOptions checkpointOptions;
if (!props.isSavepoint()) {
checkpointOptions = CheckpointOptions.forCheckpoint();
} else {
checkpointOptions = CheckpointOptions.forSavepoint(targetDirectory);
} // send the messages to the tasks that trigger their checkpoint
for (Execution execution: executions) {
execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);
} numUnsuccessfulCheckpointsTriggers.set(0);
return new CheckpointTriggerResult(checkpoint);
public void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
final SimpleSlot slot = assignedResource;
if (slot != null) {
final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);
} else {
LOG.debug("The execution has no slot assigned. This indicates that the execution is " +
"no longer running.");
}
}
@Override
public void triggerCheckpoint(
ExecutionAttemptID executionAttemptID,
JobID jobId,
long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions) { Preconditions.checkNotNull(executionAttemptID);
Preconditions.checkNotNull(jobId); actorGateway.tell(new TriggerCheckpoint(jobId, executionAttemptID, checkpointId, timestamp, checkpointOptions));
}
其中,for (Execution execution: executions) 这里面的executions里面是所有的输入节点,也就是flink source节点,所以checkpoint这些barrier 时间首先从jobmanager发送给了所有的source task
JobCheckpointingSettings settings = new JobCheckpointingSettings(
triggerVertices,
ackVertices,
commitVertices,
new CheckpointCoordinatorConfiguration(
interval,
cfg.getCheckpointTimeout(),
cfg.getMinPauseBetweenCheckpoints(),
cfg.getMaxConcurrentCheckpoints(),
retentionAfterTermination,
isExactlyOnce),
serializedStateBackend,
serializedHooks); jobGraph for (JobVertex vertex : jobVertices.values()) {
if (vertex.isInputVertex()) {
triggerVertices.add(vertex.getID());
}
commitVertices.add(vertex.getID());
ackVertices.add(vertex.getID());
}
flink checkpoint 源码分析 (一)的更多相关文章
- flink checkpoint 源码分析 (二)
转发请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/8260370.html flink checkpoint 源码分析 (一)一文主要讲述了在JobMan ...
- Flink源码阅读(二)——checkpoint源码分析
前言 在Flink原理——容错机制一文中,已对checkpoint的机制有了较为基础的介绍,本文着重从源码方面去分析checkpoint的过程.当然本文只是分析做checkpoint的调度过程,只是尽 ...
- flink-connector-kafka consumer checkpoint源码分析
转发请注明原创地址:http://www.cnblogs.com/dongxiao-yang/p/7700600.html <flink-connector-kafka consumer的top ...
- flink1.7 checkpoint源码分析
初始化state类 //org.apache.flink.streaming.runtime.tasks.StreamTask#initializeState initializeState(); p ...
- Flink的Job启动TaskManager端(源码分析)
前面说到了 Flink的JobManager启动(源码分析) 启动了TaskManager 然后 Flink的Job启动JobManager端(源码分析) 说到JobManager会将转化得到 ...
- [源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
- [源码分析] 从实例和源码入手看 Flink 之广播 Broadcast
[源码分析] 从实例和源码入手看 Flink 之广播 Broadcast 0x00 摘要 本文将通过源码分析和实例讲解,带领大家熟悉Flink的广播变量机制. 0x01 业务需求 1. 场景需求 对黑 ...
- Flink源码分析 - 源码构建
原文地址:https://mp.weixin.qq.com/s?__biz=MzU2Njg5Nzk0NQ==&mid=2247483692&idx=1&sn=18cddc1ee ...
- 从flink-example分析flink组件(3)WordCount 流式实战及源码分析
前面介绍了批量处理的WorkCount是如何执行的 <从flink-example分析flink组件(1)WordCount batch实战及源码分析> <从flink-exampl ...
随机推荐
- 【递归】先修课 计算概论(A) / 函数递归练习(3)2:分解因数
#include<cstdio> using namespace std; bool is_prime(int x) { ;i*i<=x;i++) ) return false; r ...
- 【可持久化Trie】模板
总算找到个能看懂的了,orz Lavender. #define INF 2147483647 #define N 100001 #define MAXBIT 31 int root[N],ch[N* ...
- Problem I: 零起点学算法30——输出四位完全平方数
#include<stdio.h> int main() { int a,b,c,d,s,i; ;i<;i++){ s=i*i; a=s/; b=s%/; c=s%/; d=s%; ...
- JQuery给动态HTML绑定事件
说明:涉及到事件委托原理,这里不深究了. 直接使用live或者delegate去实现.网上说on也可以,没测试过. 注意:live在新版的JQuery已经取消.on在比较新的版本才支持. 参考: ht ...
- Windows 无法验证此设备所需的驱动程序的数字签名。某软件或硬件最近有所更改,可能安装了签名错误或损毁的文件,或者安装的文件可能是来路不明的恶意软件。(代码52)
由未签名驱动导致的键鼠装无法使用的问题 usb 问题失效. 要是win 10的话 导致的结果就是 无线键鼠套装无法使用. 解决办法是 1.按下shift 按键 点击重启按钮 重启后 2.疑难解答-- ...
- 网络采集软件核心技术剖析系列(2)---如何使用C#语言获得任意站点博文的正文及标题
一 本系列随笔概览及产生的背景 本系列开篇受到大家的热烈欢迎,这对博主是莫大的鼓励,此为本系列第二篇,希望大家继续支持,为我继续写作提供动力. 自己开发的豆约翰博客备份专家软件工具问世3年多以来,深受 ...
- SQL:将查询结果插入到另一个表的三种情况!
一:如果要插入目标表不存在: select * into 目标表 from 表 where ... 二:如果要插入目标表已经存在: insert into 目的表 select * from 表 wh ...
- 降低web服务器压力
一.越来越多的并发连接数 现在的Web系统面对的并发连接数在近几年呈现指数增长,高并发成为了一种常态,给Web系统带来不小的挑战.以最简单粗暴的方式解决,就是增加Web系统的机器和升级硬件配置.虽然现 ...
- 折腾kubernetes各种问题汇总
折腾fluend-elasticsearch日志,折腾出一大堆问题,解决这些问题过程中,感觉又了解了不少. 1.如何删除不一致状态下的rc,deployment,service. 在某些情况下,经常发 ...
- Java中的值传递与“引用传递”
首先,Java没有 引用传递 这么一说. Java仅仅有值传递.传递的都是值,基本数据类型传递的是值.引用类型传递的是地址值. 我来理一下这当中宛如米线跟米面绞在一起了,再跟粉丝混在一起的关系. 好的 ...