初始化state类

//org.apache.flink.streaming.runtime.tasks.StreamTask#initializeState

initializeState();

private void initializeState() throws Exception {



StreamOperator<?>[] allOperators = operatorChain.getAllOperators();



for (StreamOperator<?> operator : allOperators) {

if (null != operator) {

operator.initializeState();

}

}

}

operator.initializeState() 调用的方法路径 org.apache.flink.streaming.api.operators.AbstractStreamOperator#initializeState() ,所有的操作流类都继承该类,同时也没有重写这个方法。

public final void initializeState() throws Exception {

////这里会调用状态后端,里面很重要

1. final StreamOperatorStateContext context =

streamTaskStateManager.streamOperatorStateContext(

getOperatorID(),

getClass().getSimpleName(),

this,

keySerializer,

streamTaskCloseableRegistry,

metrics);

...

|

streamTaskStateManager.streamOperatorStateContext(......)调用方法的路径org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl#streamOperatorStateContext

......

// -------------- Keyed State Backend 这里是重点 关于checkpoint--------------

keyedStatedBackend = keyedStatedBackend(

keySerializer,

operatorIdentifierText,

prioritizedOperatorSubtaskStates,

streamTaskCloseableRegistry,

metricGroup);



// -------------- Operator State Backend 这里是重点 关于checkpoint --------------

operatorStateBackend = operatorStateBackend(

operatorIdentifierText,

prioritizedOperatorSubtaskStates,

streamTaskCloseableRegistry);

......

keyedStatedBackend() 这个方法最里面是调用了 org.apache.flink.streaming.api.operators.BackendRestorerProcedure#attemptCreateAndRestore

private T attemptCreateAndRestore(Collection restoreState) throws Exception {

......

// create a new, empty backend.

final T backendInstance = instanceSupplier.get();



// attempt to restore from snapshot (or null if no state was checkpointed).

backendInstance.restore(restoreState);

......

}

backendInstance.restore(restoreState)调用的方法路径org.apache.flink.runtime.state.DefaultOperatorStateBackend#restore

// registeredOperatorStates这个对象是核心

...

PartitionableListState<?> listState = registeredOperatorStates.get(restoredSnapshot.getName());



if (null == listState) {

listState = new PartitionableListState<>(restoredMetaInfo);

//重点,这里就是存储了快照状态类

//********************************************************************

registeredOperatorStates.put(listState.getStateMetaInfo().getName(), listState);

//********************************************************************

} else {

// TODO with eager state registration in place, check here for serializer migration strategies

}

...

triggerCheckpoint 将定时触发执行checkpoint,而上面是是初始化的执行逻辑

定时快照state类

org.apache.flink.runtime.checkpoint.CheckpointCoordinator#triggerCheckpoint(long, boolean) 

......

// send the messages to the tasks that trigger their checkpoint 我猜测这里就是远程发送触发checkpoint的步骤 这里进行的数据文件的生成奶奶的

for (Execution execution: executions) {

execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);

}

......

execution.triggerCheckpoint调用路径 org.apache.flink.runtime.executiongraph.Execution#triggerCheckpoint

/**

* Trigger a new checkpoint on the task of this execution.

* @param checkpointId of th checkpoint to trigger

* @param timestamp of the checkpoint to trigger

* @param checkpointOptions of the checkpoint to trigger

/

public void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {

      ......

final LogicalSlot slot = assignedResource;

if (slot != null) {

final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();

taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);

}

      .....

}

taskManagerGateway.triggerCheckpoint(......)里面最终调用路径 org.apache.flink.runtime.taskexecutor.TaskExecutor#triggerCheckpoint

@Override

public CompletableFuture triggerCheckpoint(

ExecutionAttemptID executionAttemptID,long checkpointId,long checkpointTimestamp,CheckpointOptions checkpointOptions) {

  ......

final Task task = taskSlotTable.getTask(executionAttemptID);

if (task != null) {

task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions);



return CompletableFuture.completedFuture(Acknowledge.get());

}

  ......

}

task.triggerCheckpointBarrier(......)调用路径 org.apache.flink.runtime.taskmanager.Task#triggerCheckpointBarrier

/
*

  • Calls the invokable to trigger a checkpoint.
  • 这里开始出发执行checkpoint,应该算是入口了,会调用org.apache.flink.streaming.runtime.tasks.StreamTask#triggerCheckpoint
  • AsyncCheckpointRunnable 任务在里面被执行
  • @param checkpointID The ID identifying the checkpoint.
  • @param checkpointTimestamp The timestamp associated with the checkpoint.
  • @param checkpointOptions Options for performing this checkpoint.

    */

    public void triggerCheckpointBarrier(

    final long checkpointID,

    long checkpointTimestamp,

    final CheckpointOptions checkpointOptions) {



    final AbstractInvokable invokable = this.invokable;

    final CheckpointMetaData checkpointMetaData = new CheckpointMetaData(checkpointID, checkpointTimestamp);



    if (executionState == ExecutionState.RUNNING && invokable != null) {



    // build a local closure

    final String taskName = taskNameWithSubtask;

    final SafetyNetCloseableRegistry safetyNetCloseableRegistry =

    FileSystemSafetyNet.getSafetyNetCloseableRegistryForThread();



    Runnable runnable = new Runnable() {

    @Override

    public void run() {

    // set safety net from the task's context for checkpointing thread

    LOG.debug("Creating FileSystem stream leak safety net for {}", Thread.currentThread().getName());

    FileSystemSafetyNet.setSafetyNetCloseableRegistryForThread(safetyNetCloseableRegistry);



    try {

    boolean success = invokable.triggerCheckpoint(checkpointMetaData, checkpointOptions);

    ......

    }

      ......

    }

    };

    //创建线程数为1的线程池,提交runnable任务运行

    executeAsyncCallRunnable(runnable, String.format("Checkpoint Trigger for %s (%s).", taskNameWithSubtask, executionId));

    }

    }

    invokable.triggerCheckpoint(.....)里面最终调用的方法链如下:

    org.apache.flink.streaming.runtime.tasks.StreamTask#triggerCheckpoint

    org.apache.flink.streaming.runtime.tasks.StreamTask#performCheckpoint

    // we can do a checkpoint



    // All of the following steps happen as an atomic step from the perspective of barriers and

    // records/watermarks/timers/callbacks.

    // We generally try to emit the checkpoint barrier as soon as possible to not affect downstream

    // checkpoint alignments



    // Step (1): Prepare the checkpoint, allow operators to do some pre-barrier work.

    //           The pre-barrier work should be nothing or minimal in the common case.

    operatorChain.prepareSnapshotPreBarrier(checkpointMetaData.getCheckpointId());



    // Step (2): Send the checkpoint barrier downstream 生成状态数据 存储数据的对象为checkpointOptions 尼玛 今天debug没有生成数据呦

    operatorChain.broadcastCheckpointBarrier(

    checkpointMetaData.getCheckpointId(),

    checkpointMetaData.getTimestamp(),

    checkpointOptions);



    // Step (3): Take the state snapshot. This should be largely asynchronous, to not

    //           impact progress of the streaming topology

    checkpointState(checkpointMetaData, checkpointOptions, checkpointMetrics);

    checkpointState(......) 里面最终调用org.apache.flink.streaming.runtime.tasks.StreamTask.CheckpointingOperation#executeCheckpointing()

    重点警戒线.....................................................

    ......

    //调用用户的快照方法

    for (StreamOperator<?> op : allOperators) {//不同的算子对应的子类不一样,

    checkpointStreamOperator(op);

    }

    //后面生成数据,哪里生成数据了,要找到



    //这个run任务好像只生成元数据

    // we are transferring ownership over snapshotInProgressList for cleanup to the thread, active on submit

    AsyncCheckpointRunnable asyncCheckpointRunnable = new AsyncCheckpointRunnable(

    owner,

    operatorSnapshotsInProgress,

    checkpointMetaData,

    checkpointMetrics,

    startAsyncPartNano);



    owner.cancelables.registerCloseable(asyncCheckpointRunnable);

    owner.asyncOperationsThreadPool.submit(asyncCheckpointRunnable;

    ......
  1. checkpointStreamOperator(op);

private void checkpointStreamOperator(StreamOperator<?> op) throws Exception {

if (null != op) {

       //这个构造方法是核心

OperatorSnapshotFutures snapshotInProgress = op.snapshotState(

checkpointMetaData.getCheckpointId(),

checkpointMetaData.getTimestamp(),

checkpointOptions,

storageLocation);

operatorSnapshotsInProgress.put(op.getOperatorID(), snapshotInProgress);

}

}

op.snapshotState()是核心,调用org.apache.flink.streaming.api.operators.AbstractStreamOperator#snapshotState(long, long, org.apache.flink.runtime.checkpoint.CheckpointOptions, org.apache.flink.runtime.state.CheckpointStreamFactory)

注意因为op是子类,有些累实现AbstractStreamOperator有些子类实现AbstractUdfStreamOperator,所以在下面调用snapshotState(snapshotContext)方法时,会根据子类的实现不同,调用org.apache.flink.streaming.api.operators.AbstractStreamOperator#snapshotState(org.apache.flink.runtime.state.StateSnapshotContext)

或org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator#snapshotState

AbstractStreamOperator 实现类有94个

AbstractUdfStreamOperator实现类有42个

AbstractUdfStreamOperator继承AbstractStreamOperator

@Override

public final OperatorSnapshotFutures snapshotState(long checkpointId, long timestamp, CheckpointOptions checkpointOptions,

CheckpointStreamFactory factory) throws Exception {



try (StateSnapshotContextSynchronousImpl snapshotContext = new StateSnapshotContextSynchronousImpl(

checkpointId,

timestamp,

factory,

keyGroupRange,

getContainingTask().getCancelables())) {



//继承AbstractUdfStreamOperator的操作类会调用用户的快照方法,继承AbstractStreamOperator的操作类会调用这个方法,但是这个方法没有做什么东西。

snapshotState(snapshotContext);

       //上面调用好用户的快照方法了,就是确定了状态类里面目前的数据了。

       //下面就是如何访问到状态类,讲状态内的数据写入磁盘了。

snapshotInProgress.setKeyedStateRawFuture(snapshotContext.getKeyedStateStreamFuture());

snapshotInProgress.setOperatorStateRawFuture(snapshotContext.getOperatorStateStreamFuture());

//这里是生产状态数据文件

if (null != operatorStateBackend) {

System.out.println(Thread.currentThread().getName()+"::这里将状态数据写入文件中");

snapshotInProgress.setOperatorStateManagedFuture(

operatorStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions));

}

       //这里是生产状态数据文件

if (null != keyedStateBackend) {

snapshotInProgress.setKeyedStateManagedFuture(

keyedStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions));

}

}

return snapshotInProgress;

}

operatorStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions))调用路径org.apache.flink.runtime.state.DefaultOperatorStateBackend#snapshot

谜底就在下面

public RunnableFuture<SnapshotResult> snapshot(

long checkpointId,

long timestamp,

@Nonnull CheckpointStreamFactory streamFactory,

@Nonnull CheckpointOptions checkpointOptions) throws Exception {

long syncStartTime = System.currentTimeMillis();



       //这个是超级关键的地方,你想知道如何访问到用户函数中的状态类,就在这里。

RunnableFuture<SnapshotResult> snapshotRunner =

snapshotStrategy.snapshot(checkpointId, timestamp, streamFactory, checkpointOptions);



snapshotStrategy.logSyncCompleted(streamFactory, syncStartTime);

return snapshotRunner;

}

snapshotStrategy.snapshot(checkpointId, timestamp, streamFactory, checkpointOptions)调用路径,取决于用户指定的后端状态,默认调用路径如下org.apache.flink.runtime.state.DefaultOperatorStateBackend.DefaultOperatorStateBackendSnapshotStrategy#snapshot

DefaultOperatorStateBackendSnapshotStrategy 是DefaultOperatorStateBackend的内部类

public RunnableFuture<SnapshotResult> snapshot(......) throws IOException {

//貌似数据就存在 registeredOperatorStates对象里面 其实下面的步骤不用研究,就是将状态数据写入文件,主要看看这个registeredOperatorStates是怎么弄到的

//************重点 registeredOperatorStates   对象

final Map<String, PartitionableListState<?>> registeredOperatorStatesDeepCopies =

new HashMap<>(registeredOperatorStates.size());

final Map<String, BackendWritableBroadcastState> registeredBroadcastStatesDeepCopies =

new HashMap<>(registeredBroadcastStates.size());



ClassLoader snapshotClassLoader = Thread.currentThread().getContextClassLoader();

try {

// eagerly create deep copies of the list and the broadcast states (if any)

// in the synchronous phase, so that we can use them in the async writing.

//entry.getValue() 里面就是状态类 将状态类存储在新建的map对象中

if (!registeredOperatorStates.isEmpty()) {

for (Map.Entry<String, PartitionableListState<?>> entry : registeredOperatorStates.entrySet()) {

PartitionableListState<?> listState = entry.getValue();

if (null != listState) {

listState = listState.deepCopy();

}

registeredOperatorStatesDeepCopies.put(entry.getKey(), listState);

}

}

//广播状态

if (!registeredBroadcastStates.isEmpty()) {

for (Map.Entry<String, BackendWritableBroadcastState> entry : registeredBroadcastStates.entrySet()) {

BackendWritableBroadcastState broadcastState = entry.getValue();

if (null != broadcastState) {

broadcastState = broadcastState.deepCopy();

}

registeredBroadcastStatesDeepCopies.put(entry.getKey(), broadcastState);

}

}

}

        //这个方法里面生成了状态数据文件
AsyncSnapshotCallable<SnapshotResult<OperatorStateHandle>> snapshotCallable =
new AsyncSnapshotCallable<SnapshotResult<OperatorStateHandle>>() {



@Override

protected SnapshotResult callInternal() throws Exception {

......

// get the registered operator state infos ...

List operatorMetaInfoSnapshots =

new ArrayList<>(registeredOperatorStatesDeepCopies.size());



for (Map.Entry<String, PartitionableListState<?>> entry :

registeredOperatorStatesDeepCopies.entrySet()) {

operatorMetaInfoSnapshots.add(entry.getValue().getStateMetaInfo().snapshot());

}



// ... get the registered broadcast operator state infos ...

List broadcastMetaInfoSnapshots =

new ArrayList<>(registeredBroadcastStatesDeepCopies.size());



for (Map.Entry<String, BackendWritableBroadcastState> entry :

registeredBroadcastStatesDeepCopies.entrySet()) {

broadcastMetaInfoSnapshots.add(entry.getValue().getStateMetaInfo().snapshot());

}



// ... write them all in the checkpoint stream ...

DataOutputView dov = new DataOutputViewStreamWrapper(localOut);



OperatorBackendSerializationProxy backendSerializationProxy =

new OperatorBackendSerializationProxy(operatorMetaInfoSnapshots, broadcastMetaInfoSnapshots);



backendSerializationProxy.write(dov);



// ... and then go for the states ...



......

}

};



final FutureTask<SnapshotResult> task =

snapshotCallable.toAsyncSnapshotFutureTask(closeStreamOnCancelRegistry);



if (!asynchronousSnapshots) {

task.run();

}



return task;

}

}

从上面我们可以看到,状态类都存放在registeredOperatorStatesDeepCopies这个map中。

用户能够更新状态类的数据都是因为这样访问到了状态类

public void initializeState(FunctionInitializationContext context) throws Exception {

......

checkpointedState = context.getOperatorStateStore().getListState(descriptor);

......

}

调用的就是org.apache.flink.runtime.state.DefaultOperatorStateBackend#getListState(org.apache.flink.api.common.state.ListStateDescriptor)

/**

* @Description: 返回状态类的时候,将状态类放入map对象供后面写入文件中

* @Param:

* @return:

* @Author: intsmaze

* @Date: 2019/1/18

/

private ListState getListState(

ListStateDescriptor stateDescriptor,

OperatorStateHandle.Mode mode) throws StateMigrationException {

@SuppressWarnings("unchecked")

PartitionableListState previous = (PartitionableListState) accessedStatesByName.get(name);

if (previous != null) {

checkStateNameAndMode(

previous.getStateMetaInfo().getName(),

name,

previous.getStateMetaInfo().getAssignmentMode(),

mode);

return previous;

}

      ......

PartitionableListState partitionableListState = (PartitionableListState) registeredOperatorStates.get(name);



if (null == partitionableListState) {

// no restored state for the state name; simply create new state holder

partitionableListState = new PartitionableListState<>(

new RegisteredOperatorStateBackendMetaInfo<>(

name,

partitionStateSerializer,

mode));

//这里也会存储状态类数据registeredOperatorStates这个对象和DefaultOperatorStateBackendSnapshotStrategy类的快照方法访问的对象共享

//
************************************************************

registeredOperatorStates.put(name, partitionableListState);

}

flink1.7 checkpoint源码分析的更多相关文章

  1. flink checkpoint 源码分析 (二)

    转发请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/8260370.html flink checkpoint 源码分析 (一)一文主要讲述了在JobMan ...

  2. Flink源码阅读(二)——checkpoint源码分析

    前言 在Flink原理——容错机制一文中,已对checkpoint的机制有了较为基础的介绍,本文着重从源码方面去分析checkpoint的过程.当然本文只是分析做checkpoint的调度过程,只是尽 ...

  3. flink-connector-kafka consumer checkpoint源码分析

    转发请注明原创地址:http://www.cnblogs.com/dongxiao-yang/p/7700600.html <flink-connector-kafka consumer的top ...

  4. flink checkpoint 源码分析 (一)

    转发请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/8029356.html checkpoint是Flink Fault Tolerance机制的重要构成 ...

  5. Heritrix源码分析(九) Heritrix的二次抓取以及如何让Heritrix抓取你不想抓取的URL

    本博客属原创文章,欢迎转载!转载请务必注明出处:http://guoyunsky.iteye.com/blog/644396       本博客已迁移到本人独立博客: http://www.yun5u ...

  6. Hadoop之HDFS原理及文件上传下载源码分析(上)

    HDFS原理 首先说明下,hadoop的各种搭建方式不再介绍,相信各位玩hadoop的同学随便都能搭出来. 楼主的环境: 操作系统:Ubuntu 15.10 hadoop版本:2.7.3 HA:否(随 ...

  7. ElasticSearch Index操作源码分析

    ElasticSearch Index操作源码分析 本文记录ElasticSearch创建索引执行源码流程.从执行流程角度看一下创建索引会涉及到哪些服务(比如AllocationService.Mas ...

  8. Mesos源码分析(12): Mesos-Slave接收到RunTask消息

    在前文Mesos源码分析(8): Mesos-Slave的初始化中,Mesos-Slave接收到RunTaskMessage消息,会调用Slave::runTask.   void Slave::ru ...

  9. Spark 源码分析 -- task实际执行过程

    Spark源码分析 – SparkContext 中的例子, 只分析到sc.runJob 那么最终是怎么执行的? 通过DAGScheduler切分成Stage, 封装成taskset, 提交给Task ...

随机推荐

  1. Mac 电脑如何安装mac os 和win7双系统(win7多分区)

    转载:Mac 电脑如何安装mac os 和win7双系统(win7多分区) 本文主要参考了2篇博文,并通过自己的亲身实践总结的.参考的2篇博文地址: http://wenku.baidu.com/li ...

  2. mysql之系统默认数据库

    相关内容: 系统默认数据库information_schema,performance_schema,mysql,test  的意义 首发时间:2018-02-23 17:10 安装mysql完成后, ...

  3. JAVA基础库的使用点滴

    Idea中双击SHIFT可以搜索[Eclipse中也有],这个很重要,可以找到当前的项目中可以引擎的已有的类,不要再次自己发明轮子 各种Util先在基础库和开源库中找 Base64编码 guava:c ...

  4. GPA简介

    GPA(Graphics Performance Analyzers)是Intel公司提供的一款免费的跨平台性能分析工具. 填写e-mail.name和country并提交后,就会收到一封有专属下载链 ...

  5. spring MVC,controller中获得resuqest和response的方式

    package com.devjav.spring; import java.util.List; import java.util.Locale; import javax.servlet.http ...

  6. 3d max 动作Take 001改名

    问题描述 带动作的Fbx文件导入Unity之后,动作名字为Take 001,如下所示: 在max那边是没有办法改名的,只能在Unity中改名. 方法1 1. 选中动画文件,按Ctrl + D,复制一份 ...

  7. 如何用fiddler + 手机设置无线代理 下载只有 手机才能访问的资源。

    我主要用来获取,一些特定的API,研究学习. 责任声明: 如果你用来违法犯罪,与我无关. 1.使电脑成为代理服务器 架代理服务器的软件有很多,自己百度一下.也可以用现成的代理软件.(其实Fiddler ...

  8. java学习(权限修饰符)

    Java中,可以使用访问控制符来保护对类.变量.方法和构造方法的访问.Java 支持 4 种不同的访问权限. default (即缺省,什么也不写): 在同一包内可见,不使用任何修饰符.使用对象:类. ...

  9. html + js 实现图片上传,压缩,预览及图片压缩后得到Blob对象继续上传问题

    先上效果 上传图片后(设置了最多上传3张图片,三张后上传按钮消失) 点击图片放大,可以使用删除和旋转按钮 (旋转功能主要是因为ios手机拍照后上传会有写图片被自动旋转,通过旋转功能可以调正) html ...

  10. 10.scrapy框架简介和基础应用

    今日概要 scrapy框架介绍 环境安装 基础使用 今日详情 一.什么是Scrapy? Scrapy是一个为了爬取网站数据,提取结构性数据而编写的应用框架,非常出名,非常强悍.所谓的框架就是一个已经被 ...