Checkpoint触发机制

　　Flink的checkpoint是通过定时器周期性触发的。checkpoint触发最关键的类是CheckpointCoordinator，称它为检查点协调器。

org.apache.flink.runtime.checkpoint.CheckpointCoordinator

　　CheckpointCoordinator主要作用是协调operators和state的分布式快照。它通过向相关的tasks发送触发消息和从各tasks收集确认消息（Ack）来完成checkpoint。同时，它还收集和维护各个tasks上报的状态句柄/状态引用（state handles）。

　　CheckpointCoordinator主要属性：

     /** Coordinator-wide lock to safeguard the checkpoint updates */

     private final Object lock = new Object();  //Coordinator范围的锁

     /** Lock specially to make sure that trigger requests do not overtake each other.

      * This is not done with the coordinator-wide lock, because as part of triggering,

      * blocking operations may happen (distributed atomic counters).

      * Using a dedicated lock, we avoid blocking the processing of 'acknowledge/decline'

      * messages during that phase. */

     private final Object triggerLock = new Object(); //trigger requests的专用锁，避免在获取checkpointID时阻塞对消息的处理。

     /** Tasks who need to be sent a message when a checkpoint is started */

     private final ExecutionVertex[] tasksToTrigger;

     /** Tasks who need to acknowledge a checkpoint before it succeeds */

     private final ExecutionVertex[] tasksToWaitFor;

     /** Tasks who need to be sent a message when a checkpoint is confirmed */

     private final ExecutionVertex[] tasksToCommitTo;

     /** Map from checkpoint ID to the pending checkpoint */

     private final Map<Long, PendingCheckpoint> pendingCheckpoints;//待处理的checkpoint

     /** Actor that receives status updates from the execution graph this coordinator works for */

     private JobStatusListener jobStatusListener;//Actor实例，监听Job状态变化并根据变化启停定时任务

     /** Flag whether a trigger request could not be handled immediately. Non-volatile, because only

      * accessed in synchronized scope */

     private boolean triggerRequestQueued;//标记一个触发请求是否不能被立即处理。

     /** Flag marking the coordinator as shut down (not accepting any messages any more) */

     private volatile boolean shutdown;//coordinator的关闭标志

　　ScheduledTrigger

　　ScheduledTrigger是检查点定时任务类，主要是调用了triggerCheckpoint方法。

     private final class ScheduledTrigger implements Runnable {

         @Override

         public void run() {

             try {

                 triggerCheckpoint(System.currentTimeMillis(), true);

             }

             catch (Exception e) {

                 LOG.error("Exception while triggering checkpoint.", e);

             }

         }

     }

　　下面具体看一下 triggerCheckpoint 方法的具体实现

     //触发一个新的标准检查点。timestamp为触发检查点的时间戳，isPeriodic标志是否是周期性的触发

     public boolean triggerCheckpoint(long timestamp, boolean isPeriodic) {

         return triggerCheckpoint(timestamp, checkpointProperties, checkpointDirectory, isPeriodic).isSuccess();

     }

　　触发检查点的核心逻辑：

　　　　首先进行触发Checkpoint之前的预检查，判断是否满足条件；

　　　　然后获取一个CheckpointID，创建PendingCheckpoint实例；

　　　　之后重新检查触发条件是否满足要求，防止产生竞态条件；

　　　　最后将PendingCheckpoint实例checkpoint加入到pendingCheckpoints中，并向tasks发送消息触发它们的检查点。

     CheckpointTriggerResult triggerCheckpoint(

             long timestamp,

             CheckpointProperties props,

             String targetDirectory,

             boolean isPeriodic) {

         // Sanity check 如果检查点是存储在外部系统中且targetDirectory为空，报错

         if (props.externalizeCheckpoint() && targetDirectory == null) {

             throw new IllegalStateException("No target directory specified to persist checkpoint to.");

         }

         // make some eager pre-checks 一些checkpoint之前的预检查

         synchronized (lock) {

             // abort if the coordinator has been shutdown in the meantime

             if (shutdown) {

                 return new CheckpointTriggerResult(CheckpointDeclineReason.COORDINATOR_SHUTDOWN);

             }

             // Don't allow periodic checkpoint if scheduling has been disabled

             if (isPeriodic && !periodicScheduling) {

                 return new CheckpointTriggerResult(CheckpointDeclineReason.PERIODIC_SCHEDULER_SHUTDOWN);

             }

             // validate whether the checkpoint can be triggered, with respect to the limit of

             // concurrent checkpoints, and the minimum time between checkpoints.

             // these checks are not relevant for savepoints

             // 验证checkpoint是否能被触发，关于并发检查点的限制和检查点之间的最小时间。

             // 判断checkpoint是否被强制。强制checkpoint不受并发检查点最大数量和检查点之间最小时间的限制。

             if (!props.forceCheckpoint()) {

                 // sanity check: there should never be more than one trigger request queued

                 if (triggerRequestQueued) {

                     //如果不能被立即触发，直接返回异常

                     LOG.warn("Trying to trigger another checkpoint while one was queued already");

                     return new CheckpointTriggerResult(CheckpointDeclineReason.ALREADY_QUEUED);

                 }

                 // if too many checkpoints are currently in progress, we need to mark that a request is queued

                 if (pendingCheckpoints.size() >= maxConcurrentCheckpointAttempts) {

                     //如果未完成的检查点太多，大于配置的并发检查点最大数量，则将当前检查点的触发请求设置为不能立即执行。

                     triggerRequestQueued = true;

                     //如果定时任务已经启动，则取消定时任务的执行。

                     if (currentPeriodicTrigger != null) {

                         currentPeriodicTrigger.cancel(false);

                         currentPeriodicTrigger = null;

                     }

                     return new CheckpointTriggerResult(CheckpointDeclineReason.TOO_MANY_CONCURRENT_CHECKPOINTS);

                 }

                 // make sure the minimum interval between checkpoints has passed

                 //检查是否满足checkpoint之间的最小时间间隔的条件

                 final long earliestNext = lastCheckpointCompletionNanos + minPauseBetweenCheckpointsNanos;

                 final long durationTillNextMillis = (earliestNext - System.nanoTime()) / 1_000_000;

                 if (durationTillNextMillis > 0) {

                     if (currentPeriodicTrigger != null) {

                         currentPeriodicTrigger.cancel(false);

                         currentPeriodicTrigger = null;

                     }

                     // Reassign the new trigger to the currentPeriodicTrigger

                     //此时延迟时间设置为durationTillNextMillis

                     currentPeriodicTrigger = timer.scheduleAtFixedRate(

                             new ScheduledTrigger(),

                             durationTillNextMillis, baseInterval, TimeUnit.MILLISECONDS);

                     return new CheckpointTriggerResult(CheckpointDeclineReason.MINIMUM_TIME_BETWEEN_CHECKPOINTS);

                 }

             }

         }

         // check if all tasks that we need to trigger are running.

         // if not, abort the checkpoint

         // 检查需要触发checkpoint的所有Tasks是否处于运行状态，如果有一个不满足条件，则不触发检查点

         Execution[] executions = new Execution[tasksToTrigger.length];

         for (int i = 0; i < tasksToTrigger.length; i++) {

             Execution ee = tasksToTrigger[i].getCurrentExecutionAttempt();

             if (ee != null && ee.getState() == ExecutionState.RUNNING) {

                 executions[i] = ee;

             } else {

                 LOG.info("Checkpoint triggering task {} is not being executed at the moment. Aborting checkpoint.",

                         tasksToTrigger[i].getTaskNameWithSubtaskIndex());

                 return new CheckpointTriggerResult(CheckpointDeclineReason.NOT_ALL_REQUIRED_TASKS_RUNNING);

             }

         }

         // next, check if all tasks that need to acknowledge the checkpoint are running.

         // if not, abort the checkpoint

         //检查所有需要ack的tasks是否都处于运行状态，如果有一个不满足条件，则不触发检查点。

         Map<ExecutionAttemptID, ExecutionVertex> ackTasks = new HashMap<>(tasksToWaitFor.length);

         for (ExecutionVertex ev : tasksToWaitFor) {

             Execution ee = ev.getCurrentExecutionAttempt();

             if (ee != null) {

                 ackTasks.put(ee.getAttemptId(), ev);

             } else {

                 LOG.info("Checkpoint acknowledging task {} is not being executed at the moment. Aborting checkpoint.",

                         ev.getTaskNameWithSubtaskIndex());

                 return new CheckpointTriggerResult(CheckpointDeclineReason.NOT_ALL_REQUIRED_TASKS_RUNNING);

             }

         }

         // we will actually trigger this checkpoint!

         // we lock with a special lock to make sure that trigger requests do not overtake each other.

         // this is not done with the coordinator-wide lock, because the 'checkpointIdCounter'

         // may issue blocking operations. Using a different lock than the coordinator-wide lock,

         // we avoid blocking the processing of 'acknowledge/decline' messages during that time.

         // 触发检查点，在triggerLock同步代码块中完成，而不是使用coordinator范围的锁。

         synchronized (triggerLock) {

             final long checkpointID;

             //首先获取checkpointID

             try {

                 // this must happen outside the coordinator-wide lock, because it communicates

                 // with external services (in HA mode) and may block for a while.

                 checkpointID = checkpointIdCounter.getAndIncrement();

             }

             catch (Throwable t) {

                 int numUnsuccessful = numUnsuccessfulCheckpointsTriggers.incrementAndGet();

                 LOG.warn("Failed to trigger checkpoint (" + numUnsuccessful + " consecutive failed attempts so far)", t);

                 return new CheckpointTriggerResult(CheckpointDeclineReason.EXCEPTION);

             }

             //创建PendingCheckpoint实例，表示待处理检查点

             final PendingCheckpoint checkpoint = new PendingCheckpoint(

                 job,

                 checkpointID,

                 timestamp,

                 ackTasks,

                 props,

                 targetDirectory,

                 executor);

             if (statsTracker != null) {

                 PendingCheckpointStats callback = statsTracker.reportPendingCheckpoint(

                     checkpointID,

                     timestamp,

                     props);

                 checkpoint.setStatsCallback(callback);

             }

             // schedule the timer that will clean up the expired checkpoints

             // 针对当前checkpoints超时进行资源清理的canceller

             final Runnable canceller = new Runnable() {

                 @Override

                 public void run() {

                     synchronized (lock) {

                         // only do the work if the checkpoint is not discarded anyways

                         // note that checkpoint completion discards the pending checkpoint object

                         if (!checkpoint.isDiscarded()) {

                             LOG.info("Checkpoint " + checkpointID + " expired before completing.");

                             checkpoint.abortExpired();

                             pendingCheckpoints.remove(checkpointID);

                             rememberRecentCheckpointId(checkpointID);

                             triggerQueuedRequests();

                         }

                     }

                 }

             };

             try {

                 //重新请求coordinator-wide lock

                 // re-acquire the coordinator-wide lock

                 synchronized (lock) {

                     // since we released the lock in the meantime, we need to re-check

                     // that the conditions still hold.

                     // 重新检查触发条件，防止产生竞态条件。这里做二次检查的原因是，中间有一段关于获得checkpointId的代码，不在同步块中。

                     if (shutdown) {

                         return new CheckpointTriggerResult(CheckpointDeclineReason.COORDINATOR_SHUTDOWN);

                     }

                     else if (!props.forceCheckpoint()) {

                         if (triggerRequestQueued) {

                             LOG.warn("Trying to trigger another checkpoint while one was queued already");

                             return new CheckpointTriggerResult(CheckpointDeclineReason.ALREADY_QUEUED);

                         }

                         if (pendingCheckpoints.size() >= maxConcurrentCheckpointAttempts) {

                             triggerRequestQueued = true;

                             if (currentPeriodicTrigger != null) {

                                 currentPeriodicTrigger.cancel(false);

                                 currentPeriodicTrigger = null;

                             }

                             return new CheckpointTriggerResult(CheckpointDeclineReason.TOO_MANY_CONCURRENT_CHECKPOINTS);

                         }

                         // make sure the minimum interval between checkpoints has passed

                         final long earliestNext = lastCheckpointCompletionNanos + minPauseBetweenCheckpointsNanos;

                         final long durationTillNextMillis = (earliestNext - System.nanoTime()) / 1_000_000;

                         if (durationTillNextMillis > 0) {

                             if (currentPeriodicTrigger != null) {

                                 currentPeriodicTrigger.cancel(false);

                                 currentPeriodicTrigger = null;

                             }

                             // Reassign the new trigger to the currentPeriodicTrigger

                             currentPeriodicTrigger = timer.scheduleAtFixedRate(

                                     new ScheduledTrigger(),

                                     durationTillNextMillis, baseInterval, TimeUnit.MILLISECONDS);

                             return new CheckpointTriggerResult(CheckpointDeclineReason.MINIMUM_TIME_BETWEEN_CHECKPOINTS);

                         }

                     }

                     LOG.info("Triggering checkpoint " + checkpointID + " @ " + timestamp);

                     //将checkpoint加入到pendingCheckpoints中

                     pendingCheckpoints.put(checkpointID, checkpoint);

                     //启动超时canceller，延迟checkpointTimeout执行

                     ScheduledFuture<?> cancellerHandle = timer.schedule(

                             canceller,

                             checkpointTimeout, TimeUnit.MILLISECONDS);

                     if (!checkpoint.setCancellerHandle(cancellerHandle)) {

                         // checkpoint is already disposed!

                         cancellerHandle.cancel(false);

                     }

                     // trigger the master hooks for the checkpoint

                     final List<MasterState> masterStates = MasterHooks.triggerMasterHooks(masterHooks.values(),

                             checkpointID, timestamp, executor, Time.milliseconds(checkpointTimeout));

                     for (MasterState s : masterStates) {

                         checkpoint.addMasterState(s);

                     }

                 }

                 // end of lock scope

                 CheckpointOptions checkpointOptions;

                 if (!props.isSavepoint()) {

                     checkpointOptions = CheckpointOptions.forFullCheckpoint();

                 } else {

                     checkpointOptions = CheckpointOptions.forSavepoint(targetDirectory);

                 }

                 // send the messages to the tasks that trigger their checkpoint

                 // 向tasks发送消息，触发它们的检查点

                 for (Execution execution: executions) {

                     execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);

                 }

                 numUnsuccessfulCheckpointsTriggers.set(0);

                 return new CheckpointTriggerResult(checkpoint);

             }

             catch (Throwable t) {

                 // guard the map against concurrent modifications

                 synchronized (lock) {

                     pendingCheckpoints.remove(checkpointID);

                 }

                 int numUnsuccessful = numUnsuccessfulCheckpointsTriggers.incrementAndGet();

                 LOG.warn("Failed to trigger checkpoint {}. ({} consecutive failed attempts so far)",

                         checkpointID, numUnsuccessful, t);

                 if (!checkpoint.isDiscarded()) {

                     checkpoint.abortError(new Exception("Failed to trigger checkpoint", t));

                 }

                 return new CheckpointTriggerResult(CheckpointDeclineReason.EXCEPTION);

             }

         } // end trigger lock

     }

　　启动定时任务方法：startCheckpointScheduler

     public void startCheckpointScheduler() {

         synchronized (lock) {

             if (shutdown) {

                 throw new IllegalArgumentException("Checkpoint coordinator is shut down");

             }

             //保证所有以前的timer被取消

             stopCheckpointScheduler();

             periodicScheduling = true;

             //scheduleAtFixedRate方法是以固定延迟和固定时间间隔周期性的执行任务

             currentPeriodicTrigger = timer.scheduleAtFixedRate(

                     new ScheduledTrigger(),

                     baseInterval, baseInterval, TimeUnit.MILLISECONDS);

         }

     }

　　停止定时任务方法：stopCheckpointScheduler

     //重置一些标记变量，释放资源

     public void stopCheckpointScheduler() {

         synchronized (lock) {

             triggerRequestQueued = false;

             periodicScheduling = false;

             if (currentPeriodicTrigger != null) {

                 currentPeriodicTrigger.cancel(false);//取消当前周期的触发任务

                 currentPeriodicTrigger = null;

             }

             //pendingCheckpoints中存的是待执行的检查点

             for (PendingCheckpoint p : pendingCheckpoints.values()) {

                 p.abortError(new Exception("Checkpoint Coordinator is suspending."));

             }

             pendingCheckpoints.clear();//清空pendingCheckpoints

             numUnsuccessfulCheckpointsTriggers.set(0);

         }

     }

基于Actor的消息驱动的协同机制

　　启动和停止定时任务的机制是怎样的?Flink使用的是基于AKKA的Actor模型的消息驱动机制。

　　CheckpointCoordinatorDeActivator类

org.apache.flink.runtime.checkpoint.CheckpointCoordinatorDeActivator

　　CheckpointCoordinatorDeActivator是actor的实现类，监听JobStatus的变化，启动和停止周期性的checkpoint调度任务。

     //actor的实现类，监听JobStatus的变化，激活和取消周期性的checkpoint调度任务。

     public class CheckpointCoordinatorDeActivator implements JobStatusListener {

         private final CheckpointCoordinator coordinator;

         public CheckpointCoordinatorDeActivator(CheckpointCoordinator coordinator) {

             this.coordinator = checkNotNull(coordinator);

         }

         @Override

         public void jobStatusChanges(JobID jobId, JobStatus newJobStatus, long timestamp, Throwable error) {

             if (newJobStatus == JobStatus.RUNNING) {

                 // start the checkpoint scheduler

                 // 一旦监听到JobStatus变为RUNNING，就会启动定时任务

                 coordinator.startCheckpointScheduler();

             } else {

                 // anything else should stop the trigger for now

                 coordinator.stopCheckpointScheduler();

             }

         }

     }

　　CheckpointCoordinatorDeActivator的实例是在CheckpointCoordinator中被创建的，方法为createActivatorDeactivator。

     public JobStatusListener createActivatorDeactivator() {

         synchronized (lock) {

             if (shutdown) {

                 throw new IllegalArgumentException("Checkpoint coordinator is shut down");

             }

             if (jobStatusListener == null) {

                 jobStatusListener = new CheckpointCoordinatorDeActivator(this);

             }

             return jobStatusListener;

         }

     }

　　checkpoint相关Akka消息

　　AbstractCheckpointMessage ：所有checkpoint消息的基础抽象类

org.apache.flink.runtime.messages.checkpoint.AbstractCheckpointMessage

　　AbstractCheckpointMessage主要属性：

     /** The job to which this message belongs */

     private final JobID job;

     /** The task execution that is source/target of the checkpoint message */

     private final ExecutionAttemptID taskExecutionId; //检查点的source/target task

     /** The ID of the checkpoint that this message coordinates */

     private final long checkpointId;

　　它有以下实现类：

　　　　TriggerCheckpoint ：JobManager向TaskManager发送的检查点触发消息；

　　　　AcknowledgeCheckpoint ：TaskManager向JobManager发送的某个独立task的检查点完成确认的消息；

　　　　DeclineCheckpoint ：TaskManager向JobManager发送的检查点还没有被处理的消息；

　　　　NotifyCheckpointComplete ：JobManager向TaskManager发送的检查点完成的消息。

　　TriggerCheckpoint消息

　　　　从JobManager发送到TaskManager，通知指定的task触发checkpoint。

　　　　发送消息

　　　　　　发送消息的逻辑是在CheckpointCoordinator中，上文提到过：

         // send the messages to the tasks that trigger their checkpoint

         for (Execution execution: executions) {

             execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);

         }

　　　　　　其中executions是Execution[]数组，其中存储的元素是在检查点触发时需要被发送消息的Tasks的集合（即CheckpointCoordinator成员变量tasksToTrigger中的数据）。对每一个要发送的Task执行triggerCheckpoint()方法。

　　　　　　接下来，看一下Execution的triggerCheckpoint方法。

         //在该execution的task上触发一个新的checkpoint

         public void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {

             //获取Resource

             final SimpleSlot slot = assignedResource;//获取Slot

             if (slot != null) {

                 //TaskManagerGateway是用于和TaskManager通信的抽象基础类

                 final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();

                 //通过taskManagerGateway向TaskManager发送消息

                 taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);

             } else {

                 LOG.debug("The execution has no slot assigned. This indicates that the execution is " +

                     "no longer running.");

             }

         }

　　　　　　继续进入ActorTaskManagerGateway（TaskManagerGateway抽象类的Actor实现）类的triggerCheckpoint()方法：

         public void triggerCheckpoint(

                 ExecutionAttemptID executionAttemptID,

                 JobID jobId,

                 long checkpointId,

                 long timestamp,

                 CheckpointOptions checkpointOptions) {

             Preconditions.checkNotNull(executionAttemptID);

             Preconditions.checkNotNull(jobId);

             //新建了一个TriggerCheckpoint消息，通过actorGateway的tell方法（异步发送，没有返回结果）发送这个消息

             //ActorGateway是基于actor通信的接口

             actorGateway.tell(new TriggerCheckpoint(jobId, executionAttemptID, checkpointId, timestamp, checkpointOptions));

         }

　　　　　　AkkaActorGateway类是ActorGateway接口一种实现，它使用 Akka 与远程的actors进行通信。看一下AkkaActorGateway的tell方法：

         @Override

         public void tell(Object message) {

             Object newMessage = decorator.decorate(message);

             //通过ActorRef实例actor发送消息，ActorRef是akka中的类。以后需要研究Akka的实现机制。

             actor.tell(newMessage, ActorRef.noSender());

         }

　　　　　　至此，发送TriggerCheckpoint消息的过程结束。下面将看一下TaskManager接收消息的过程。

　　　　接收消息

　　　　　　TaskManager接收消息的部分是用scala实现的。

org.apache.flink.runtime.taskmanager.TaskManager

　　　　　　TaskManager类的handleMessage方法是消息处理中心。

         //该方法为TaskManager的消息处理中心。接收消息，按消息的种类调用不同的方法处理。

         override def handleMessage: Receive = {

             case message: TaskMessage => handleTaskMessage(message)

             //这个就是处理checkpoints相关的消息

             case message: AbstractCheckpointMessage => handleCheckpointingMessage(message)

             case JobManagerLeaderAddress(address, newLeaderSessionID) =>

               handleJobManagerLeaderAddress(address, newLeaderSessionID)

             case message: RegistrationMessage => handleRegistrationMessage(message)

             ...

         }

　　　　　　接下来，看方法handleCheckpointingMessage()，主要是触发Checkpoint Barrier。

         //处理Checkpoint相关的消息

         private def handleCheckpointingMessage(actorMessage: AbstractCheckpointMessage): Unit = {

             actorMessage match {

               //触发Checkpoint消息

               case message: TriggerCheckpoint =>

                 val taskExecutionId = message.getTaskExecutionId

                 val checkpointId = message.getCheckpointId

                 val timestamp = message.getTimestamp

                 val checkpointOptions = message.getCheckpointOptions

                 log.debug(s"Receiver TriggerCheckpoint $checkpointId@$timestamp for $taskExecutionId.")

                 val task = runningTasks.get(taskExecutionId)

                 if (task != null) {

                   //调用Task的triggerCheckpointBarrier方法，触发Checkpoint Barrier，Barrier实现机制的细节以后讨论。

                   task.triggerCheckpointBarrier(checkpointId, timestamp, checkpointOptions)

                 } else {

                   log.debug(s"TaskManager received a checkpoint request for unknown task $taskExecutionId.")

                 }

               //Checkpoint完成通知消息

               case message: NotifyCheckpointComplete =>

                 val taskExecutionId = message.getTaskExecutionId

                 val checkpointId = message.getCheckpointId

                 val timestamp = message.getTimestamp

                 log.debug(s"Receiver ConfirmCheckpoint $checkpointId@$timestamp for $taskExecutionId.")

                 val task = runningTasks.get(taskExecutionId)

                 if (task != null) {

                   //调用Task的notifyCheckpointComplete方法，进行相关处理

                   task.notifyCheckpointComplete(checkpointId)

                 } else {

                   log.debug(

                     s"TaskManager received a checkpoint confirmation for unknown task $taskExecutionId.")

                 }

               // unknown checkpoint message

               case _ => unhandled(actorMessage)

             }

           }

　　NotifyCheckpointComplete消息

　　　　JobManager发送到TaskManager，通知task它的检查点已经得到完成确认，task可以向第三方提交checkpoint。

　　　　发送消息

　　　　　　发送NotifyCheckpointComplete消息的部分在CheckpointCoordinator类的receiveAcknowledgeMessage方法中。

         //该方法接收一个AcknowledgeCheckpoint消息，返回该Message是否与一个pending checkpoint相关联

         public boolean receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws CheckpointException {

             if (shutdown || message == null) {

                 return false;

             }

             if (!job.equals(message.getJob())) {

                 LOG.error("Received wrong AcknowledgeCheckpoint message for job {}: {}", job, message);

                 return false;

             }

             final long checkpointId = message.getCheckpointId();

             synchronized (lock) {

                 // we need to check inside the lock for being shutdown as well, otherwise we

                 // get races and invalid error log messages

                 if (shutdown) {

                     return false;

                 }

                 final PendingCheckpoint checkpoint = pendingCheckpoints.get(checkpointId);

                 //如果是待处理的检查点并且没有被Discarded

                 if (checkpoint != null && !checkpoint.isDiscarded()) {

                     //根据TaskExecutionId和SubtaskState，Acknowledges the task。确认该任务

                     switch (checkpoint.acknowledgeTask(message.getTaskExecutionId(), message.getSubtaskState(), message.getCheckpointMetrics())) {

                         //确认成功

                         case SUCCESS:

                             LOG.debug("Received acknowledge message for checkpoint {} from task {} of job {}.",

                                 checkpointId, message.getTaskExecutionId(), message.getJob());

                             //如果收到了全部task的确认消息（即notYetAcknowledgedTasks为空）

                             if (checkpoint.isFullyAcknowledged()) {

                                 //尝试完成PendingCheckpoint（Try to complete the given pending checkpoint）

                                 //将完成的checkpointId从checkpoint中删除和一下标志修改，最后，发送notify complete消息

                                 completePendingCheckpoint(checkpoint);

                             }

                             break;

                         //重复消息

                         case DUPLICATE:

                             LOG.debug("Received a duplicate acknowledge message for checkpoint {}, task {}, job {}.",

                                 message.getCheckpointId(), message.getTaskExecutionId(), message.getJob());

                             break;

                         //未知消息

                         case UNKNOWN:

                             LOG.warn("Could not acknowledge the checkpoint {} for task {} of job {}, " +

                                     "because the task's execution attempt id was unknown. Discarding " +

                                     "the state handle to avoid lingering state.", message.getCheckpointId(),

                                 message.getTaskExecutionId(), message.getJob());

                             discardSubtaskState(message.getJob(), message.getTaskExecutionId(), message.getCheckpointId(), message.getSubtaskState());

                             break;

                         //废弃消息

                         case DISCARDED:

                             LOG.warn("Could not acknowledge the checkpoint {} for task {} of job {}, " +

                                 "because the pending checkpoint had been discarded. Discarding the " +

                                     "state handle tp avoid lingering state.",

                                 message.getCheckpointId(), message.getTaskExecutionId(), message.getJob());

                             discardSubtaskState(message.getJob(), message.getTaskExecutionId(), message.getCheckpointId(), message.getSubtaskState());

                     }

                     return true;

                 }

                 else if (checkpoint != null) {

                     // this should not happen

                     throw new IllegalStateException(

                             "Received message for discarded but non-removed checkpoint " + checkpointId);

                 }

                 else {

                     boolean wasPendingCheckpoint;

                     // message is for an unknown checkpoint, or comes too late (checkpoint disposed)

                     if (recentPendingCheckpoints.contains(checkpointId)) {

                         wasPendingCheckpoint = true;

                         LOG.warn("Received late message for now expired checkpoint attempt {} from " +

                             "{} of job {}.", checkpointId, message.getTaskExecutionId(), message.getJob());

                     }

                     else {

                         LOG.debug("Received message for an unknown checkpoint {} from {} of job {}.",

                             checkpointId, message.getTaskExecutionId(), message.getJob());

                         wasPendingCheckpoint = false;

                     }

                     // try to discard the state so that we don't have lingering state lying around

                     discardSubtaskState(message.getJob(), message.getTaskExecutionId(), message.getCheckpointId(), message.getSubtaskState());

                     return wasPendingCheckpoint;

                 }

             }

         }

　　　　　　completePendingCheckpoint方法中发送NotifyCheckpointComplete消息的代码如下：

         for (ExecutionVertex ev : tasksToCommitTo) {

             Execution ee = ev.getCurrentExecutionAttempt();

             if (ee != null) {

                 ee.notifyCheckpointComplete(checkpointId, timestamp);

             }

         }

　　　　接收消息

　　　　　　在TriggerCheckpoint消息接收中的有这部分代码，主要是调用notifyCheckpointComplete方法： task.notifyCheckpointComplete(checkpointId)。

　　AcknowledgeCheckpoint消息

　　　　由TaskManager发向JobManager，告知JobManager指定task的checkpoint已完成。该消息可能携带task的状态和checkpointMetrics。

　　　　AcknowledgeCheckpoint消息类的两个属性：

	private final SubtaskState subtaskState;//任务状态

	private final CheckpointMetrics checkpointMetrics;

　　　　发送消息

　　　　　　发送消息的过程在RuntimeEnvironment类中的acknowledgeCheckpoint方法

         public void acknowledgeCheckpoint(

                 long checkpointId,

                 CheckpointMetrics checkpointMetrics,

                 SubtaskState checkpointStateHandles) {

             //通过CheckpointResponder接口的实例checkpointResponder发送ack消息

             checkpointResponder.acknowledgeCheckpoint(

                     jobId, executionId, checkpointId, checkpointMetrics,

                     checkpointStateHandles);

         }

　　　　　　CheckpointResponder接口是checkpoint acknowledge and decline messages 的应答类。ActorGatewayCheckpointResponder是使用了ActorGateway的CheckpointResponder接口的实现类，包含acknowledgeCheckpoint和declineCheckpoint两个方法。

         @Override

         public void acknowledgeCheckpoint(

                 JobID jobID,

                 ExecutionAttemptID executionAttemptID,

                 long checkpointId,

                 CheckpointMetrics checkpointMetrics,

                 SubtaskState checkpointStateHandles) {

             //新建一个AcknowledgeCheckpoint消息

             AcknowledgeCheckpoint message = new AcknowledgeCheckpoint(

                     jobID, executionAttemptID, checkpointId, checkpointMetrics,

                     checkpointStateHandles);

             //通过actorGateway发送出去

             actorGateway.tell(message);

         }

　　　　接收消息

　　　　　　通过receiveAcknowledgeMessage方法接收（和NotifyCheckpointComplete消息的发送过程在同一个方法）。

　　DeclineCheckpoint消息

　　　　该消息由TaskManager发送给JobManager，用于告知CheckpointCoordinator：检查点的请求还没有能够被处理。这种情况通常发生于：某task已处于RUNNING状态，但在内部可能还没有准备好执行检查点。

　　　　发送消息

　　　　　　位于task类的triggerCheckpointBarrier方法中。

org.apache.flink.runtime.taskmanager.Task

         try {

             boolean success = statefulTask.triggerCheckpoint(checkpointMetaData, checkpointOptions);

             if (!success) {

                 //通过CheckpointResponder发送消息，类似发送AcknowledgeCheckpoint消息

                 checkpointResponder.declineCheckpoint(

                         getJobID(), getExecutionId(), checkpointID,

                         new CheckpointDeclineTaskNotReadyException(taskName));

             }

         }

　　　　接收消息

　　　　　　CheckpointCoordinator中的receiveDeclineMessage方法。

         public void receiveDeclineMessage(DeclineCheckpoint message) {

             if (shutdown || message == null) {

                 return;

             }

             if (!job.equals(message.getJob())) {

                 throw new IllegalArgumentException("Received DeclineCheckpoint message for job " +

                     message.getJob() + " while this coordinator handles job " + job);

             }

             final long checkpointId = message.getCheckpointId();

             final String reason = (message.getReason() != null ? message.getReason().getMessage() : "");

             PendingCheckpoint checkpoint;

             synchronized (lock) {

                 // we need to check inside the lock for being shutdown as well, otherwise we

                 // get races and invalid error log messages

                 if (shutdown) {

                     return;

                 }

                 checkpoint = pendingCheckpoints.get(checkpointId);

                 if (checkpoint != null && !checkpoint.isDiscarded()) {

                     //如果是待处理的Checkpoint且没有被遗弃

                     LOG.info("Discarding checkpoint {} because of checkpoint decline from task {} : {}",

                             checkpointId, message.getTaskExecutionId(), reason);

                     pendingCheckpoints.remove(checkpointId);//将checkpointId从pendingCheckpoints中删除

                     checkpoint.abortDeclined();

                     rememberRecentCheckpointId(checkpointId);

                     // we don't have to schedule another "dissolving" checkpoint any more because the

                     // cancellation barriers take care of breaking downstream alignments

                     // we only need to make sure that suspended queued requests are resumed

                     //是否还有更多pending 的checkpoint

                     boolean haveMoreRecentPending = false;

                     for (PendingCheckpoint p : pendingCheckpoints.values()) {

                         if (!p.isDiscarded() && p.getCheckpointId() >= checkpoint.getCheckpointId()) {

                             haveMoreRecentPending = true;

                             break;

                         }

                     }

                     //

                     if (!haveMoreRecentPending) {

                         triggerQueuedRequests();

                     }

                 }

                 else if (checkpoint != null) {

                     // this should not happen

                     throw new IllegalStateException(

                             "Received message for discarded but non-removed checkpoint " + checkpointId);

                 }

                 else if (LOG.isDebugEnabled()) {

                     if (recentPendingCheckpoints.contains(checkpointId)) {

                         // message is for an unknown checkpoint, or comes too late (checkpoint disposed)

                         LOG.debug("Received another decline message for now expired checkpoint attempt {} : {}",

                                 checkpointId, reason);

                     } else {

                         // message is for an unknown checkpoint. might be so old that we don't even remember it any more

                         LOG.debug("Received decline message for unknown (too old?) checkpoint attempt {} : {}",

                                 checkpointId, reason);

                     }

                 }

             }

         }

我的博客即将同步至腾讯云+社区，邀请大家一同入驻：https://cloud.tencent.com/developer/support-plan?invite_code=vq5xxge4ldlk

Flink源码阅读（一）--Checkpoint触发机制的更多相关文章

Flink源码阅读(1.7.2)
目录 Client提交任务 flink的图结构 StreamGraph OptimizedPlan JobGraph ExecutionGraph flink部署与执行模型 Single Job Jo ...
Flink源码阅读（一）——Flink on Yarn的Per-job模式源码简析
一.前言个人感觉学习Flink其实最不应该错过的博文是Flink社区的博文系列,里面的文章是不会让人失望的.强烈安利:https://ververica.cn/developers-resource ...
Flink源码阅读（二）——checkpoint源码分析
前言在Flink原理——容错机制一文中,已对checkpoint的机制有了较为基础的介绍,本文着重从源码方面去分析checkpoint的过程.当然本文只是分析做checkpoint的调度过程,只是尽 ...
flink源码阅读
Flink面试--源码篇 1.Flink Job的提交流程? 2.Flink所谓"三层图"结构是哪几个"图"? 3.JobManger在集群中扮演了什么角色? ...
Android源码阅读笔记二消息处理机制
消息处理机制: .MessageQueue: 用来描述消息队列2.Looper:用来创建消息队列3.Handler:用来发送消息队列初始化: .通过Looper.prepare()创建一个Loope ...
Flink 源码解析 —— 深度解析 Flink 序列化机制
Flink 序列化机制 https://t.zsxq.com/JaQfeMf 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭 ...
Redis源码阅读（一）事件机制
Redis源码阅读(一)事件机制 Redis作为一款NoSQL非关系内存数据库,具有很高的读写性能,且原生支持的数据类型丰富,被广泛的作为缓存.分布式数据库.消息队列等应用.此外Redis还有许多高可 ...
Flink 源码解析 —— 深度解析 Flink 是如何管理好内存的？
前言如今,许多用于分析大型数据集的开源系统都是用 Java 或者是基于 JVM 的编程语言实现的.最着名的例子是 Apache Hadoop,还有较新的框架,如 Apache Spark.Apach ...
[源码阅读] 阿里SOFA服务注册中心MetaServer(3)
[源码阅读] 阿里SOFA服务注册中心MetaServer(3) 目录 [源码阅读] 阿里SOFA服务注册中心MetaServer(3) 0x00 摘要 0x01 概念 1.1 分布式一致性 1.2 ...

随机推荐

在PXC中重新添加掉线节点
Preface When we add a new node into PXC structure,it will estimate the mothed(IST/SST) to tr ...
Python 3基础教程30-sys模块
本文介绍sys模块,简单打印两个重定向输出. 目前使用机会没有,以后实际用到了,再去研究和学习.
从底层带你理解Python中的一些内部机制
下面博文将带你创建一个字节码级别的追踪API以追踪Python的一些内部机制,比如类似YIELDVALUE.YIELDFROM操作码的实现,推式构造列表(List Comprehensions).生成 ...
08-Mysql数据库----完整性约束
总结: 1,not null 不能插入空,不设置可空 2,unique 单列唯一 create table department(name char(10) unique); ...
python 基础篇 04(列表元组常规操作)
本节主要内容:1. 列表2. 列表的增删改查3. 列表的嵌套4. 元组和元组嵌套5. range 一. 列表1.1 列表的介绍列表是python的基础数据类型之一 ,其他编程语言也有类似的数据类型. ...
%matplotlib inline
整理摘自 https://zhidao.baidu.com/question/1387744870700677180.html %matplotlib inline是jupyter notebook里 ...
PyQt5图像全屏显示
Windows装这个:https://pypi.python.org/pypi/PyQt5Ubuntu输入这个:sudo apt-get install python3-pyqt5 或者直接输入:pi ...
【转】hexo博客图片问题
1.首先确认_config.yml 中有 post_asset_folder:true. Hexo 提供了一种更方便管理 Asset 的设定:post_asset_folder 当您设置post_as ...
lubuntu 使用USB摄像头
http://liangbing8612.blog.51cto.com/2633208/598762 Most of the camera driver has integrated in the k ...
lintcode-64-合并排序数组 II
64-合并排序数组 II 合并两个排序的整数数组A和B变成一个新的数组. 注意事项你可以假设A具有足够的空间(A数组的大小大于或等于m+n)去添加B中的元素. 样例给出 A = [1, 2, 3, ...

Flink源码阅读（一）--Checkpoint触发机制

Checkpoint触发机制

ScheduledTrigger

基于Actor的消息驱动的协同机制

CheckpointCoordinatorDeActivator类

checkpoint相关Akka消息

TriggerCheckpoint消息

NotifyCheckpointComplete消息

AcknowledgeCheckpoint消息

DeclineCheckpoint消息

Flink源码阅读（一）--Checkpoint触发机制的更多相关文章

随机推荐

热门专题

　　ScheduledTrigger

　　CheckpointCoordinatorDeActivator类

　　checkpoint相关Akka消息

　　TriggerCheckpoint消息

　　NotifyCheckpointComplete消息

　　AcknowledgeCheckpoint消息

　　DeclineCheckpoint消息