Flink源码阅读(一)--Checkpoint触发机制
Checkpoint触发机制
Flink的checkpoint是通过定时器周期性触发的。checkpoint触发最关键的类是CheckpointCoordinator,称它为检查点协调器。
org.apache.flink.runtime.checkpoint.CheckpointCoordinator
CheckpointCoordinator主要作用是协调operators和state的分布式快照。它通过向相关的tasks发送触发消息和从各tasks收集确认消息(Ack)来完成checkpoint。同时,它还收集和维护各个tasks上报的状态句柄/状态引用(state handles)。
CheckpointCoordinator主要属性:
/** Coordinator-wide lock to safeguard the checkpoint updates */
private final Object lock = new Object(); //Coordinator范围的锁 /** Lock specially to make sure that trigger requests do not overtake each other.
* This is not done with the coordinator-wide lock, because as part of triggering,
* blocking operations may happen (distributed atomic counters).
* Using a dedicated lock, we avoid blocking the processing of 'acknowledge/decline'
* messages during that phase. */
private final Object triggerLock = new Object(); //trigger requests的专用锁,避免在获取checkpointID时阻塞对消息的处理。 /** Tasks who need to be sent a message when a checkpoint is started */
private final ExecutionVertex[] tasksToTrigger; /** Tasks who need to acknowledge a checkpoint before it succeeds */
private final ExecutionVertex[] tasksToWaitFor; /** Tasks who need to be sent a message when a checkpoint is confirmed */
private final ExecutionVertex[] tasksToCommitTo; /** Map from checkpoint ID to the pending checkpoint */
private final Map<Long, PendingCheckpoint> pendingCheckpoints;//待处理的checkpoint /** Actor that receives status updates from the execution graph this coordinator works for */
private JobStatusListener jobStatusListener;//Actor实例,监听Job状态变化并根据变化启停定时任务 /** Flag whether a trigger request could not be handled immediately. Non-volatile, because only
* accessed in synchronized scope */
private boolean triggerRequestQueued;//标记一个触发请求是否不能被立即处理。 /** Flag marking the coordinator as shut down (not accepting any messages any more) */
private volatile boolean shutdown;//coordinator的关闭标志
ScheduledTrigger
ScheduledTrigger是检查点定时任务类,主要是调用了triggerCheckpoint方法。
private final class ScheduledTrigger implements Runnable {
@Override
public void run() {
try {
triggerCheckpoint(System.currentTimeMillis(), true);
}
catch (Exception e) {
LOG.error("Exception while triggering checkpoint.", e);
}
}
}
下面具体看一下 triggerCheckpoint 方法的具体实现
//触发一个新的标准检查点。timestamp为触发检查点的时间戳,isPeriodic标志是否是周期性的触发
public boolean triggerCheckpoint(long timestamp, boolean isPeriodic) {
return triggerCheckpoint(timestamp, checkpointProperties, checkpointDirectory, isPeriodic).isSuccess();
}
触发检查点的核心逻辑:
首先进行触发Checkpoint之前的预检查,判断是否满足条件;
然后获取一个CheckpointID,创建PendingCheckpoint实例;
之后重新检查触发条件是否满足要求,防止产生竞态条件;
最后将PendingCheckpoint实例checkpoint加入到pendingCheckpoints中,并向tasks发送消息触发它们的检查点。
CheckpointTriggerResult triggerCheckpoint(
long timestamp,
CheckpointProperties props,
String targetDirectory,
boolean isPeriodic) { // Sanity check 如果检查点是存储在外部系统中且targetDirectory为空,报错
if (props.externalizeCheckpoint() && targetDirectory == null) {
throw new IllegalStateException("No target directory specified to persist checkpoint to.");
} // make some eager pre-checks 一些checkpoint之前的预检查
synchronized (lock) {
// abort if the coordinator has been shutdown in the meantime
if (shutdown) {
return new CheckpointTriggerResult(CheckpointDeclineReason.COORDINATOR_SHUTDOWN);
} // Don't allow periodic checkpoint if scheduling has been disabled
if (isPeriodic && !periodicScheduling) {
return new CheckpointTriggerResult(CheckpointDeclineReason.PERIODIC_SCHEDULER_SHUTDOWN);
} // validate whether the checkpoint can be triggered, with respect to the limit of
// concurrent checkpoints, and the minimum time between checkpoints.
// these checks are not relevant for savepoints
// 验证checkpoint是否能被触发,关于并发检查点的限制和检查点之间的最小时间。
// 判断checkpoint是否被强制。强制checkpoint不受并发检查点最大数量和检查点之间最小时间的限制。
if (!props.forceCheckpoint()) {
// sanity check: there should never be more than one trigger request queued
if (triggerRequestQueued) {
//如果不能被立即触发,直接返回异常
LOG.warn("Trying to trigger another checkpoint while one was queued already");
return new CheckpointTriggerResult(CheckpointDeclineReason.ALREADY_QUEUED);
} // if too many checkpoints are currently in progress, we need to mark that a request is queued
if (pendingCheckpoints.size() >= maxConcurrentCheckpointAttempts) {
//如果未完成的检查点太多,大于配置的并发检查点最大数量,则将当前检查点的触发请求设置为不能立即执行。
triggerRequestQueued = true;
//如果定时任务已经启动,则取消定时任务的执行。
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
return new CheckpointTriggerResult(CheckpointDeclineReason.TOO_MANY_CONCURRENT_CHECKPOINTS);
} // make sure the minimum interval between checkpoints has passed
//检查是否满足checkpoint之间的最小时间间隔的条件
final long earliestNext = lastCheckpointCompletionNanos + minPauseBetweenCheckpointsNanos;
final long durationTillNextMillis = (earliestNext - System.nanoTime()) / 1_000_000; if (durationTillNextMillis > 0) {
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
// Reassign the new trigger to the currentPeriodicTrigger
//此时延迟时间设置为durationTillNextMillis
currentPeriodicTrigger = timer.scheduleAtFixedRate(
new ScheduledTrigger(),
durationTillNextMillis, baseInterval, TimeUnit.MILLISECONDS); return new CheckpointTriggerResult(CheckpointDeclineReason.MINIMUM_TIME_BETWEEN_CHECKPOINTS);
}
}
} // check if all tasks that we need to trigger are running.
// if not, abort the checkpoint
// 检查需要触发checkpoint的所有Tasks是否处于运行状态,如果有一个不满足条件,则不触发检查点
Execution[] executions = new Execution[tasksToTrigger.length];
for (int i = 0; i < tasksToTrigger.length; i++) {
Execution ee = tasksToTrigger[i].getCurrentExecutionAttempt();
if (ee != null && ee.getState() == ExecutionState.RUNNING) {
executions[i] = ee;
} else {
LOG.info("Checkpoint triggering task {} is not being executed at the moment. Aborting checkpoint.",
tasksToTrigger[i].getTaskNameWithSubtaskIndex());
return new CheckpointTriggerResult(CheckpointDeclineReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
}
} // next, check if all tasks that need to acknowledge the checkpoint are running.
// if not, abort the checkpoint
//检查所有需要ack的tasks是否都处于运行状态,如果有一个不满足条件,则不触发检查点。
Map<ExecutionAttemptID, ExecutionVertex> ackTasks = new HashMap<>(tasksToWaitFor.length); for (ExecutionVertex ev : tasksToWaitFor) {
Execution ee = ev.getCurrentExecutionAttempt();
if (ee != null) {
ackTasks.put(ee.getAttemptId(), ev);
} else {
LOG.info("Checkpoint acknowledging task {} is not being executed at the moment. Aborting checkpoint.",
ev.getTaskNameWithSubtaskIndex());
return new CheckpointTriggerResult(CheckpointDeclineReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
}
} // we will actually trigger this checkpoint! // we lock with a special lock to make sure that trigger requests do not overtake each other.
// this is not done with the coordinator-wide lock, because the 'checkpointIdCounter'
// may issue blocking operations. Using a different lock than the coordinator-wide lock,
// we avoid blocking the processing of 'acknowledge/decline' messages during that time.
// 触发检查点,在triggerLock同步代码块中完成,而不是使用coordinator范围的锁。
synchronized (triggerLock) {
final long checkpointID;
//首先获取checkpointID
try {
// this must happen outside the coordinator-wide lock, because it communicates
// with external services (in HA mode) and may block for a while.
checkpointID = checkpointIdCounter.getAndIncrement();
}
catch (Throwable t) {
int numUnsuccessful = numUnsuccessfulCheckpointsTriggers.incrementAndGet();
LOG.warn("Failed to trigger checkpoint (" + numUnsuccessful + " consecutive failed attempts so far)", t);
return new CheckpointTriggerResult(CheckpointDeclineReason.EXCEPTION);
} //创建PendingCheckpoint实例,表示待处理检查点
final PendingCheckpoint checkpoint = new PendingCheckpoint(
job,
checkpointID,
timestamp,
ackTasks,
props,
targetDirectory,
executor); if (statsTracker != null) {
PendingCheckpointStats callback = statsTracker.reportPendingCheckpoint(
checkpointID,
timestamp,
props); checkpoint.setStatsCallback(callback);
} // schedule the timer that will clean up the expired checkpoints
// 针对当前checkpoints超时进行资源清理的canceller
final Runnable canceller = new Runnable() {
@Override
public void run() {
synchronized (lock) {
// only do the work if the checkpoint is not discarded anyways
// note that checkpoint completion discards the pending checkpoint object
if (!checkpoint.isDiscarded()) {
LOG.info("Checkpoint " + checkpointID + " expired before completing."); checkpoint.abortExpired();
pendingCheckpoints.remove(checkpointID);
rememberRecentCheckpointId(checkpointID); triggerQueuedRequests();
}
}
}
}; try {
//重新请求coordinator-wide lock
// re-acquire the coordinator-wide lock
synchronized (lock) {
// since we released the lock in the meantime, we need to re-check
// that the conditions still hold.
// 重新检查触发条件,防止产生竞态条件。这里做二次检查的原因是,中间有一段关于获得checkpointId的代码,不在同步块中。
if (shutdown) {
return new CheckpointTriggerResult(CheckpointDeclineReason.COORDINATOR_SHUTDOWN);
}
else if (!props.forceCheckpoint()) {
if (triggerRequestQueued) {
LOG.warn("Trying to trigger another checkpoint while one was queued already");
return new CheckpointTriggerResult(CheckpointDeclineReason.ALREADY_QUEUED);
} if (pendingCheckpoints.size() >= maxConcurrentCheckpointAttempts) {
triggerRequestQueued = true;
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
return new CheckpointTriggerResult(CheckpointDeclineReason.TOO_MANY_CONCURRENT_CHECKPOINTS);
} // make sure the minimum interval between checkpoints has passed
final long earliestNext = lastCheckpointCompletionNanos + minPauseBetweenCheckpointsNanos;
final long durationTillNextMillis = (earliestNext - System.nanoTime()) / 1_000_000; if (durationTillNextMillis > 0) {
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
} // Reassign the new trigger to the currentPeriodicTrigger
currentPeriodicTrigger = timer.scheduleAtFixedRate(
new ScheduledTrigger(),
durationTillNextMillis, baseInterval, TimeUnit.MILLISECONDS); return new CheckpointTriggerResult(CheckpointDeclineReason.MINIMUM_TIME_BETWEEN_CHECKPOINTS);
}
} LOG.info("Triggering checkpoint " + checkpointID + " @ " + timestamp); //将checkpoint加入到pendingCheckpoints中
pendingCheckpoints.put(checkpointID, checkpoint); //启动超时canceller,延迟checkpointTimeout执行
ScheduledFuture<?> cancellerHandle = timer.schedule(
canceller,
checkpointTimeout, TimeUnit.MILLISECONDS); if (!checkpoint.setCancellerHandle(cancellerHandle)) {
// checkpoint is already disposed!
cancellerHandle.cancel(false);
} // trigger the master hooks for the checkpoint
final List<MasterState> masterStates = MasterHooks.triggerMasterHooks(masterHooks.values(),
checkpointID, timestamp, executor, Time.milliseconds(checkpointTimeout));
for (MasterState s : masterStates) {
checkpoint.addMasterState(s);
}
}
// end of lock scope CheckpointOptions checkpointOptions;
if (!props.isSavepoint()) {
checkpointOptions = CheckpointOptions.forFullCheckpoint();
} else {
checkpointOptions = CheckpointOptions.forSavepoint(targetDirectory);
} // send the messages to the tasks that trigger their checkpoint
// 向tasks发送消息,触发它们的检查点
for (Execution execution: executions) {
execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);
} numUnsuccessfulCheckpointsTriggers.set(0);
return new CheckpointTriggerResult(checkpoint);
}
catch (Throwable t) {
// guard the map against concurrent modifications
synchronized (lock) {
pendingCheckpoints.remove(checkpointID);
} int numUnsuccessful = numUnsuccessfulCheckpointsTriggers.incrementAndGet();
LOG.warn("Failed to trigger checkpoint {}. ({} consecutive failed attempts so far)",
checkpointID, numUnsuccessful, t); if (!checkpoint.isDiscarded()) {
checkpoint.abortError(new Exception("Failed to trigger checkpoint", t));
}
return new CheckpointTriggerResult(CheckpointDeclineReason.EXCEPTION);
} } // end trigger lock
}
启动定时任务方法:startCheckpointScheduler
public void startCheckpointScheduler() {
synchronized (lock) {
if (shutdown) {
throw new IllegalArgumentException("Checkpoint coordinator is shut down");
}
//保证所有以前的timer被取消
stopCheckpointScheduler(); periodicScheduling = true;
//scheduleAtFixedRate方法是以固定延迟和固定时间间隔周期性的执行任务
currentPeriodicTrigger = timer.scheduleAtFixedRate(
new ScheduledTrigger(),
baseInterval, baseInterval, TimeUnit.MILLISECONDS);
}
}
停止定时任务方法:stopCheckpointScheduler
//重置一些标记变量,释放资源
public void stopCheckpointScheduler() {
synchronized (lock) {
triggerRequestQueued = false;
periodicScheduling = false; if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);//取消当前周期的触发任务
currentPeriodicTrigger = null;
} //pendingCheckpoints中存的是待执行的检查点
for (PendingCheckpoint p : pendingCheckpoints.values()) {
p.abortError(new Exception("Checkpoint Coordinator is suspending."));
}
pendingCheckpoints.clear();//清空pendingCheckpoints
numUnsuccessfulCheckpointsTriggers.set(0);
}
}
基于Actor的消息驱动的协同机制
启动和停止定时任务的机制是怎样的?Flink使用的是基于AKKA的Actor模型的消息驱动机制。
CheckpointCoordinatorDeActivator类
org.apache.flink.runtime.checkpoint.CheckpointCoordinatorDeActivator
CheckpointCoordinatorDeActivator是actor的实现类,监听JobStatus的变化,启动和停止周期性的checkpoint调度任务。
//actor的实现类,监听JobStatus的变化,激活和取消周期性的checkpoint调度任务。
public class CheckpointCoordinatorDeActivator implements JobStatusListener { private final CheckpointCoordinator coordinator; public CheckpointCoordinatorDeActivator(CheckpointCoordinator coordinator) {
this.coordinator = checkNotNull(coordinator);
} @Override
public void jobStatusChanges(JobID jobId, JobStatus newJobStatus, long timestamp, Throwable error) {
if (newJobStatus == JobStatus.RUNNING) {
// start the checkpoint scheduler
// 一旦监听到JobStatus变为RUNNING,就会启动定时任务
coordinator.startCheckpointScheduler();
} else {
// anything else should stop the trigger for now
coordinator.stopCheckpointScheduler();
}
}
}
CheckpointCoordinatorDeActivator的实例是在CheckpointCoordinator中被创建的,方法为createActivatorDeactivator。
public JobStatusListener createActivatorDeactivator() {
synchronized (lock) {
if (shutdown) {
throw new IllegalArgumentException("Checkpoint coordinator is shut down");
} if (jobStatusListener == null) {
jobStatusListener = new CheckpointCoordinatorDeActivator(this);
} return jobStatusListener;
}
}
checkpoint相关Akka消息
AbstractCheckpointMessage :所有checkpoint消息的基础抽象类
org.apache.flink.runtime.messages.checkpoint.AbstractCheckpointMessage
AbstractCheckpointMessage主要属性:
/** The job to which this message belongs */
private final JobID job;
/** The task execution that is source/target of the checkpoint message */
private final ExecutionAttemptID taskExecutionId; //检查点的source/target task
/** The ID of the checkpoint that this message coordinates */
private final long checkpointId;
它有以下实现类:
TriggerCheckpoint :JobManager向TaskManager发送的检查点触发消息;
AcknowledgeCheckpoint :TaskManager向JobManager发送的某个独立task的检查点完成确认的消息;
DeclineCheckpoint :TaskManager向JobManager发送的检查点还没有被处理的消息;
NotifyCheckpointComplete :JobManager向TaskManager发送的检查点完成的消息。
TriggerCheckpoint消息
从JobManager发送到TaskManager,通知指定的task触发checkpoint。
发送消息
发送消息的逻辑是在CheckpointCoordinator中,上文提到过:
// send the messages to the tasks that trigger their checkpoint
for (Execution execution: executions) {
execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);
}
其中executions是Execution[]数组,其中存储的元素是在检查点触发时需要被发送消息的Tasks的集合(即CheckpointCoordinator成员变量tasksToTrigger中的数据)。对每一个要发送的Task执行triggerCheckpoint()方法。
接下来,看一下Execution的triggerCheckpoint方法。
//在该execution的task上触发一个新的checkpoint
public void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
//获取Resource
final SimpleSlot slot = assignedResource;//获取Slot if (slot != null) {
//TaskManagerGateway是用于和TaskManager通信的抽象基础类
final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
//通过taskManagerGateway向TaskManager发送消息
taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);
} else {
LOG.debug("The execution has no slot assigned. This indicates that the execution is " +
"no longer running.");
}
}
继续进入ActorTaskManagerGateway(TaskManagerGateway抽象类的Actor实现)类的triggerCheckpoint()方法:
public void triggerCheckpoint(
ExecutionAttemptID executionAttemptID,
JobID jobId,
long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions) { Preconditions.checkNotNull(executionAttemptID);
Preconditions.checkNotNull(jobId);
//新建了一个TriggerCheckpoint消息,通过actorGateway的tell方法(异步发送,没有返回结果)发送这个消息
//ActorGateway是基于actor通信的接口
actorGateway.tell(new TriggerCheckpoint(jobId, executionAttemptID, checkpointId, timestamp, checkpointOptions));
}
AkkaActorGateway类是ActorGateway接口一种实现,它使用 Akka 与远程的actors进行通信。看一下AkkaActorGateway的tell方法:
@Override
public void tell(Object message) {
Object newMessage = decorator.decorate(message);
//通过ActorRef实例actor发送消息,ActorRef是akka中的类。以后需要研究Akka的实现机制。
actor.tell(newMessage, ActorRef.noSender());
}
至此,发送TriggerCheckpoint消息的过程结束。下面将看一下TaskManager接收消息的过程。
接收消息
TaskManager接收消息的部分是用scala实现的。
org.apache.flink.runtime.taskmanager.TaskManager
TaskManager类的handleMessage方法是消息处理中心。
//该方法为TaskManager的消息处理中心。接收消息,按消息的种类调用不同的方法处理。
override def handleMessage: Receive = {
case message: TaskMessage => handleTaskMessage(message) //这个就是处理checkpoints相关的消息
case message: AbstractCheckpointMessage => handleCheckpointingMessage(message) case JobManagerLeaderAddress(address, newLeaderSessionID) =>
handleJobManagerLeaderAddress(address, newLeaderSessionID) case message: RegistrationMessage => handleRegistrationMessage(message) ...
}
接下来,看方法handleCheckpointingMessage(),主要是触发Checkpoint Barrier。
//处理Checkpoint相关的消息
private def handleCheckpointingMessage(actorMessage: AbstractCheckpointMessage): Unit = { actorMessage match {
//触发Checkpoint消息
case message: TriggerCheckpoint =>
val taskExecutionId = message.getTaskExecutionId
val checkpointId = message.getCheckpointId
val timestamp = message.getTimestamp
val checkpointOptions = message.getCheckpointOptions log.debug(s"Receiver TriggerCheckpoint $checkpointId@$timestamp for $taskExecutionId.") val task = runningTasks.get(taskExecutionId)
if (task != null) {
//调用Task的triggerCheckpointBarrier方法,触发Checkpoint Barrier,Barrier实现机制的细节以后讨论。
task.triggerCheckpointBarrier(checkpointId, timestamp, checkpointOptions)
} else {
log.debug(s"TaskManager received a checkpoint request for unknown task $taskExecutionId.")
}
//Checkpoint完成通知消息
case message: NotifyCheckpointComplete =>
val taskExecutionId = message.getTaskExecutionId
val checkpointId = message.getCheckpointId
val timestamp = message.getTimestamp log.debug(s"Receiver ConfirmCheckpoint $checkpointId@$timestamp for $taskExecutionId.") val task = runningTasks.get(taskExecutionId)
if (task != null) {
//调用Task的notifyCheckpointComplete方法,进行相关处理
task.notifyCheckpointComplete(checkpointId)
} else {
log.debug(
s"TaskManager received a checkpoint confirmation for unknown task $taskExecutionId.")
} // unknown checkpoint message
case _ => unhandled(actorMessage)
}
}
NotifyCheckpointComplete消息
JobManager发送到TaskManager,通知task它的检查点已经得到完成确认,task可以向第三方提交checkpoint。
发送消息
发送NotifyCheckpointComplete消息的部分在CheckpointCoordinator类的receiveAcknowledgeMessage方法中。
//该方法接收一个AcknowledgeCheckpoint消息,返回该Message是否与一个pending checkpoint相关联
public boolean receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws CheckpointException {
if (shutdown || message == null) {
return false;
}
if (!job.equals(message.getJob())) {
LOG.error("Received wrong AcknowledgeCheckpoint message for job {}: {}", job, message);
return false;
} final long checkpointId = message.getCheckpointId(); synchronized (lock) {
// we need to check inside the lock for being shutdown as well, otherwise we
// get races and invalid error log messages
if (shutdown) {
return false;
} final PendingCheckpoint checkpoint = pendingCheckpoints.get(checkpointId); //如果是待处理的检查点并且没有被Discarded
if (checkpoint != null && !checkpoint.isDiscarded()) { //根据TaskExecutionId和SubtaskState,Acknowledges the task。确认该任务
switch (checkpoint.acknowledgeTask(message.getTaskExecutionId(), message.getSubtaskState(), message.getCheckpointMetrics())) {
//确认成功
case SUCCESS:
LOG.debug("Received acknowledge message for checkpoint {} from task {} of job {}.",
checkpointId, message.getTaskExecutionId(), message.getJob());
//如果收到了全部task的确认消息(即notYetAcknowledgedTasks为空)
if (checkpoint.isFullyAcknowledged()) {
//尝试完成PendingCheckpoint(Try to complete the given pending checkpoint)
//将完成的checkpointId从checkpoint中删除和一下标志修改,最后,发送notify complete消息
completePendingCheckpoint(checkpoint);
}
break;
//重复消息
case DUPLICATE:
LOG.debug("Received a duplicate acknowledge message for checkpoint {}, task {}, job {}.",
message.getCheckpointId(), message.getTaskExecutionId(), message.getJob());
break;
//未知消息
case UNKNOWN:
LOG.warn("Could not acknowledge the checkpoint {} for task {} of job {}, " +
"because the task's execution attempt id was unknown. Discarding " +
"the state handle to avoid lingering state.", message.getCheckpointId(),
message.getTaskExecutionId(), message.getJob()); discardSubtaskState(message.getJob(), message.getTaskExecutionId(), message.getCheckpointId(), message.getSubtaskState());
break;
//废弃消息
case DISCARDED:
LOG.warn("Could not acknowledge the checkpoint {} for task {} of job {}, " +
"because the pending checkpoint had been discarded. Discarding the " +
"state handle tp avoid lingering state.",
message.getCheckpointId(), message.getTaskExecutionId(), message.getJob());
discardSubtaskState(message.getJob(), message.getTaskExecutionId(), message.getCheckpointId(), message.getSubtaskState());
} return true;
}
else if (checkpoint != null) {
// this should not happen
throw new IllegalStateException(
"Received message for discarded but non-removed checkpoint " + checkpointId);
}
else {
boolean wasPendingCheckpoint; // message is for an unknown checkpoint, or comes too late (checkpoint disposed)
if (recentPendingCheckpoints.contains(checkpointId)) {
wasPendingCheckpoint = true;
LOG.warn("Received late message for now expired checkpoint attempt {} from " +
"{} of job {}.", checkpointId, message.getTaskExecutionId(), message.getJob());
}
else {
LOG.debug("Received message for an unknown checkpoint {} from {} of job {}.",
checkpointId, message.getTaskExecutionId(), message.getJob());
wasPendingCheckpoint = false;
} // try to discard the state so that we don't have lingering state lying around
discardSubtaskState(message.getJob(), message.getTaskExecutionId(), message.getCheckpointId(), message.getSubtaskState()); return wasPendingCheckpoint;
}
}
}
completePendingCheckpoint方法中发送NotifyCheckpointComplete消息的代码如下:
for (ExecutionVertex ev : tasksToCommitTo) {
Execution ee = ev.getCurrentExecutionAttempt();
if (ee != null) {
ee.notifyCheckpointComplete(checkpointId, timestamp);
}
}
接收消息
在TriggerCheckpoint消息接收中的有这部分代码,主要是调用notifyCheckpointComplete方法: task.notifyCheckpointComplete(checkpointId)。
AcknowledgeCheckpoint消息
由TaskManager发向JobManager,告知JobManager指定task的checkpoint已完成。该消息可能携带task的状态和checkpointMetrics。
AcknowledgeCheckpoint消息类的两个属性:
private final SubtaskState subtaskState;//任务状态
private final CheckpointMetrics checkpointMetrics;
发送消息
发送消息的过程在RuntimeEnvironment类中的acknowledgeCheckpoint方法
public void acknowledgeCheckpoint(
long checkpointId,
CheckpointMetrics checkpointMetrics,
SubtaskState checkpointStateHandles) {
//通过CheckpointResponder接口的实例checkpointResponder发送ack消息
checkpointResponder.acknowledgeCheckpoint(
jobId, executionId, checkpointId, checkpointMetrics,
checkpointStateHandles);
}
CheckpointResponder接口是checkpoint acknowledge and decline messages 的应答类。ActorGatewayCheckpointResponder是使用了ActorGateway的CheckpointResponder接口的实现类,包含acknowledgeCheckpoint和declineCheckpoint两个方法。
@Override
public void acknowledgeCheckpoint(
JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics,
SubtaskState checkpointStateHandles) {
//新建一个AcknowledgeCheckpoint消息
AcknowledgeCheckpoint message = new AcknowledgeCheckpoint(
jobID, executionAttemptID, checkpointId, checkpointMetrics,
checkpointStateHandles);
//通过actorGateway发送出去
actorGateway.tell(message);
}
接收消息
通过receiveAcknowledgeMessage方法接收(和NotifyCheckpointComplete消息的发送过程在同一个方法)。
DeclineCheckpoint消息
该消息由TaskManager发送给JobManager,用于告知CheckpointCoordinator:检查点的请求还没有能够被处理。这种情况通常发生于:某task已处于RUNNING状态,但在内部可能还没有准备好执行检查点。
发送消息
位于task类的triggerCheckpointBarrier方法中。
org.apache.flink.runtime.taskmanager.Task
try {
boolean success = statefulTask.triggerCheckpoint(checkpointMetaData, checkpointOptions);
if (!success) {
//通过CheckpointResponder发送消息,类似发送AcknowledgeCheckpoint消息
checkpointResponder.declineCheckpoint(
getJobID(), getExecutionId(), checkpointID,
new CheckpointDeclineTaskNotReadyException(taskName));
}
}
接收消息
CheckpointCoordinator中的receiveDeclineMessage方法。
public void receiveDeclineMessage(DeclineCheckpoint message) {
if (shutdown || message == null) {
return;
}
if (!job.equals(message.getJob())) {
throw new IllegalArgumentException("Received DeclineCheckpoint message for job " +
message.getJob() + " while this coordinator handles job " + job);
} final long checkpointId = message.getCheckpointId();
final String reason = (message.getReason() != null ? message.getReason().getMessage() : ""); PendingCheckpoint checkpoint; synchronized (lock) {
// we need to check inside the lock for being shutdown as well, otherwise we
// get races and invalid error log messages
if (shutdown) {
return;
} checkpoint = pendingCheckpoints.get(checkpointId); if (checkpoint != null && !checkpoint.isDiscarded()) {
//如果是待处理的Checkpoint且没有被遗弃
LOG.info("Discarding checkpoint {} because of checkpoint decline from task {} : {}",
checkpointId, message.getTaskExecutionId(), reason); pendingCheckpoints.remove(checkpointId);//将checkpointId从pendingCheckpoints中删除
checkpoint.abortDeclined();
rememberRecentCheckpointId(checkpointId); // we don't have to schedule another "dissolving" checkpoint any more because the
// cancellation barriers take care of breaking downstream alignments
// we only need to make sure that suspended queued requests are resumed //是否还有更多pending 的checkpoint
boolean haveMoreRecentPending = false;
for (PendingCheckpoint p : pendingCheckpoints.values()) {
if (!p.isDiscarded() && p.getCheckpointId() >= checkpoint.getCheckpointId()) {
haveMoreRecentPending = true;
break;
}
}
//
if (!haveMoreRecentPending) {
triggerQueuedRequests();
}
}
else if (checkpoint != null) {
// this should not happen
throw new IllegalStateException(
"Received message for discarded but non-removed checkpoint " + checkpointId);
}
else if (LOG.isDebugEnabled()) {
if (recentPendingCheckpoints.contains(checkpointId)) {
// message is for an unknown checkpoint, or comes too late (checkpoint disposed)
LOG.debug("Received another decline message for now expired checkpoint attempt {} : {}",
checkpointId, reason);
} else {
// message is for an unknown checkpoint. might be so old that we don't even remember it any more
LOG.debug("Received decline message for unknown (too old?) checkpoint attempt {} : {}",
checkpointId, reason);
}
}
}
}
我的博客即将同步至腾讯云+社区,邀请大家一同入驻:https://cloud.tencent.com/developer/support-plan?invite_code=vq5xxge4ldlk
Flink源码阅读(一)--Checkpoint触发机制的更多相关文章
- Flink源码阅读(1.7.2)
目录 Client提交任务 flink的图结构 StreamGraph OptimizedPlan JobGraph ExecutionGraph flink部署与执行模型 Single Job Jo ...
- Flink源码阅读(一)——Flink on Yarn的Per-job模式源码简析
一.前言 个人感觉学习Flink其实最不应该错过的博文是Flink社区的博文系列,里面的文章是不会让人失望的.强烈安利:https://ververica.cn/developers-resource ...
- Flink源码阅读(二)——checkpoint源码分析
前言 在Flink原理——容错机制一文中,已对checkpoint的机制有了较为基础的介绍,本文着重从源码方面去分析checkpoint的过程.当然本文只是分析做checkpoint的调度过程,只是尽 ...
- flink源码阅读
Flink面试--源码篇 1.Flink Job的提交流程? 2.Flink所谓"三层图"结构是哪几个"图"? 3.JobManger在集群中扮演了什么角色? ...
- Android源码阅读笔记二 消息处理机制
消息处理机制: .MessageQueue: 用来描述消息队列2.Looper:用来创建消息队列3.Handler:用来发送消息队列 初始化: .通过Looper.prepare()创建一个Loope ...
- Flink 源码解析 —— 深度解析 Flink 序列化机制
Flink 序列化机制 https://t.zsxq.com/JaQfeMf 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭 ...
- Redis源码阅读(一)事件机制
Redis源码阅读(一)事件机制 Redis作为一款NoSQL非关系内存数据库,具有很高的读写性能,且原生支持的数据类型丰富,被广泛的作为缓存.分布式数据库.消息队列等应用.此外Redis还有许多高可 ...
- Flink 源码解析 —— 深度解析 Flink 是如何管理好内存的?
前言 如今,许多用于分析大型数据集的开源系统都是用 Java 或者是基于 JVM 的编程语言实现的.最着名的例子是 Apache Hadoop,还有较新的框架,如 Apache Spark.Apach ...
- [源码阅读] 阿里SOFA服务注册中心MetaServer(3)
[源码阅读] 阿里SOFA服务注册中心MetaServer(3) 目录 [源码阅读] 阿里SOFA服务注册中心MetaServer(3) 0x00 摘要 0x01 概念 1.1 分布式一致性 1.2 ...
随机推荐
- P1107 最大整数
P1107 最大整数 题目描述 设有n个正整数 (n<=20), 将它们连接成一排, 组成一个最大的多位整数. 例如: n=3时, 3个整数13, 312, 343连接成的最大整数为: 3433 ...
- LeetCode:12. Integer to Roman(Medium)
1. 原题链接 https://leetcode.com/problems/integer-to-roman/description/ 2. 题目要求 (1) 将整数转换成罗马数字: (2) 整数的范 ...
- iOS笔记058 - IOS之多线程
IOS开发中多线程 主线程 一个iOS程序运行后,默认会开启1条线程,称为"主线程"或"UI线程" 作用 显示和刷新界面 处理UI事件(点击.滚动.拖拽等) 注 ...
- 关于redis一些问题记录
问题一:启动redis时出现警告,使用下列命令(已解决) 问题二:启动时,需要解决的警告(未解决) 问题三:使用自己的配置文件启动redis时,可能会遇到: Could not connect to ...
- java.sql.Date java.sql.Time java.sql.Timestamp 之比较
java.sql.Date,java.sql.Time和java.sql.Timestamp 三个都是java.util.Date的子类(包装类). java.sql.Date是java.util.D ...
- 【转】用ASP.NET Core 2.1 建立规范的 REST API -- 缓存和并发
原文链接:https://www.cnblogs.com/cgzl/p/9165388.html 本文所需的一些预备知识可以看这里: http://www.cnblogs.com/cgzl/p/901 ...
- 分布式资源调度--YARN框架
YARN产生背景 YARN是Hadoop2.x才有的,所以在介绍YARN之前,我们先看一下MapReduce1.x时所存在的问题: 单点故障 节点压力大 不易扩展 MapReduce1.x时的架构如下 ...
- npm无法安装全局web3的问题
- kvm-1
yum install libvirt* virt-* qemu-kvm* -y systemctl start libvirtd.service systemctl status libvirtd. ...
- 【转】The best career advice I’ve received
原文地址:http://www.nczonline.net/blog/2013/10/15/the-best-career-advice-ive-received/ I recently had an ...