Flink中的状态与容错

1.概述

Flink支持有状态计算，根据支持得不同状态类型，分别有Keyed State和Operator State。针对状态数据得持久化，Flink提供了Checkpoint机制处理；针对状态数据，Flink提供了不同的状态管理器来管理状态数据，如MemoryStateBackend。

上面Flink的文章中，有引用word count的例子，但是都没有包含状态管理。也就是说，如果一个task在处理过程中挂掉了，那么它在内存中的状态都会丢失，所有的数据都需要重新计算。

从容错和消息处理的语义上(at least once, exactly once)，Flink引入了state和checkpoint。

首先区分一下两个概念，state一般指一个具体的task/operator的状态。而checkpoint则表示了一个Flink Job，在一个特定时刻的一份全局状态快照，即包含了所有task/operator的状态。

所谓checkpoint，就是在某一时刻，将所有task的状态做一个快照(snapshot)，然后存储到memory/file system/rocksdb等。Flink通过定期地做checkpoint来实现容错和恢复。

2.checkpoint的实现CheckpointedFunction

CheckpointedFunction的描述

/**

 * This is the core interface for <i>stateful transformation functions</i>, meaning functions

 * that maintain state across individual stream records.

 * While more lightweight interfaces exist as shortcuts for various types of state, this interface offer the

 * greatest flexibility in managing both <i>keyed state</i> and <i>operator state</i>.

 *

 * <p>The section <a href="#shortcuts">Shortcuts</a> illustrates the common lightweight

 * ways to setup stateful functions typically used instead of the full fledged

 * abstraction represented by this interface.

 *

 * <h1>Initialization</h1>

 * The {@link CheckpointedFunction#initializeState(FunctionInitializationContext)} is called when

 * the parallel instance of the transformation function is created during distributed execution.

 * The method gives access to the {@link FunctionInitializationContext} which in turn gives access

 * to the to the {@link OperatorStateStore} and {@link KeyedStateStore}.

 *

 * <p>The {@code OperatorStateStore} and {@code KeyedStateStore} give access to the data structures

 * in which state should be stored for Flink to transparently manage and checkpoint it, such as

 * {@link org.apache.flink.api.common.state.ValueState} or

 * {@link org.apache.flink.api.common.state.ListState}.

 *

 * <p><b>Note:</b> The {@code KeyedStateStore} can only be used when the transformation supports

 * <i>keyed state</i>, i.e., when it is applied on a keyed stream (after a {@code keyBy(...)}).

 *

 * <h1>Snapshot</h1>

 * The {@link CheckpointedFunction#snapshotState(FunctionSnapshotContext)} is called whenever a

 * checkpoint takes a state snapshot of the transformation function. Inside this method, functions typically

 * make sure that the checkpointed data structures (obtained in the initialization phase) are up

 * to date for a snapshot to be taken. The given snapshot context gives access to the metadata

 * of the checkpoint.

 *

 * <p>In addition, functions can use this method as a hook to flush/commit/synchronize with

 * external systems.

 *

 * <h1>Example</h1>

 * The code example below illustrates how to use this interface for a function that keeps counts

 * of events per key and per parallel partition (parallel instance of the transformation function

 * during distributed execution).

 * The example also changes of parallelism, which affect the count-per-parallel-partition by

 * adding up the counters of partitions that get merged on scale-down. Note that this is a

 * toy example, but should illustrate the basic skeleton for a stateful function.

 *

 * <p><pre>{@code

 * public class MyFunction<T> implements MapFunction<T, T>, CheckpointedFunction {

 *

 *     private ReducingState<Long> countPerKey;

 *     private ListState<Long> countPerPartition;

 *

 *     private long localCount;

 *

 *     public void initializeState(FunctionInitializationContext context) throws Exception {

 *         // get the state data structure for the per-key state

 *         countPerKey = context.getKeyedStateStore().getReducingState(

 *                 new ReducingStateDescriptor<>("perKeyCount", new AddFunction<>(), Long.class));

 *

 *         // get the state data structure for the per-partition state

 *         countPerPartition = context.getOperatorStateStore().getOperatorState(

 *                 new ListStateDescriptor<>("perPartitionCount", Long.class));

 *

 *         // initialize the "local count variable" based on the operator state

 *         for (Long l : countPerPartition.get()) {

 *             localCount += l;

 *         }

 *     }

 *

 *     public void snapshotState(FunctionSnapshotContext context) throws Exception {

 *         // the keyed state is always up to date anyways

 *         // just bring the per-partition state in shape

 *         countPerPartition.clear();

 *         countPerPartition.add(localCount);

 *     }

 *

 *     public T map(T value) throws Exception {

 *         // update the states

 *         countPerKey.add(1L);

 *         localCount++;

 *

 *         return value;

 *     }

 * }

 * }</pre>

 *

 * <hr>

 *

 * <h1><a name="shortcuts">Shortcuts</a></h1>

 * There are various ways that transformation functions can use state without implementing the

 * full-fledged {@code CheckpointedFunction} interface:

 *

 * <h4>Operator State</h4>

 * Checkpointing some state that is part of the function object itself is possible in a simpler way

 * by directly implementing the {@link ListCheckpointed} interface.

 * That mechanism is similar to the previously used {@link Checkpointed} interface.

 *

 * <h4>Keyed State</h4>

 * Access to keyed state is possible via the {@link RuntimeContext}'s methods:

 * <pre>{@code

 * public class CountPerKeyFunction<T> extends RichMapFunction<T, T> {

 *

 *     private ValueState<Long> count;

 *

 *     public void open(Configuration cfg) throws Exception {

 *         count = getRuntimeContext().getState(new ValueStateDescriptor<>("myCount", Long.class));

 *     }

 *

 *     public T map(T value) throws Exception {

 *         Long current = count.get();

 *         count.update(current == null ? 1L : current + 1);

 *

 *         return value;

 *     }

 * }

 * }</pre>

 *

 * @see ListCheckpointed

 * @see RuntimeContext

 */

2.1. 它的snapshotState调用过程如下：

核心类StreamTask

/**

 * Base class for all streaming tasks. A task is the unit of local processing that is deployed

 * and executed by the TaskManagers. Each task runs one or more {@link StreamOperator}s which form

 * the Task's operator chain. Operators that are chained together execute synchronously in the

 * same thread and hence on the same stream partition. A common case for these chains

 * are successive map/flatmap/filter tasks.

 *

 * <p>The task chain contains one "head" operator and multiple chained operators.

 * The StreamTask is specialized for the type of the head operator: one-input and two-input tasks,

 * as well as for sources, iteration heads and iteration tails.

 *

 * <p>The Task class deals with the setup of the streams read by the head operator, and the streams

 * produced by the operators at the ends of the operator chain. Note that the chain may fork and

 * thus have multiple ends.

 *

 * <p>The life cycle of the task is set up as follows:

 * <pre>{@code

 *  -- setInitialState -> provides state of all operators in the chain

 *

 *  -- invoke()

 *        |

 *        +----> Create basic utils (config, etc) and load the chain of operators

 *        +----> operators.setup()

 *        +----> task specific init()

 *        +----> initialize-operator-states()

 *        +----> open-operators()

 *        +----> run()

 *        +----> close-operators()

 *        +----> dispose-operators()

 *        +----> common cleanup

 *        +----> task specific cleanup()

 * }</pre>

 *

 * <p>The {@code StreamTask} has a lock object called {@code lock}. All calls to methods on a

 * {@code StreamOperator} must be synchronized on this lock object to ensure that no methods

 * are called concurrently.

 *

 * @param <OUT>

 * @param <OP>

 */

2.2.它的initializeState调用过程如下：

3.checkpoint的状态管理器StateBackend

StateBackend

/**

 * A <b>State Backend</b> defines how the state of a streaming application is stored and

 * checkpointed. Different State Backends store their state in different fashions, and use

 * different data structures to hold the state of a running application.

 *

 * <p>For example, the {@link org.apache.flink.runtime.state.memory.MemoryStateBackend memory state backend}

 * keeps working state in the memory of the TaskManager and stores checkpoints in the memory of the

 * JobManager. The backend is lightweight and without additional dependencies, but not highly available

 * and supports only small state.

 *

 * <p>The {@link org.apache.flink.runtime.state.filesystem.FsStateBackend file system state backend}

 * keeps working state in the memory of the TaskManager and stores state checkpoints in a filesystem

 * (typically a replicated highly-available filesystem, like <a href="https://hadoop.apache.org/">HDFS</a>,

 * <a href="https://ceph.com/">Ceph</a>, <a href="https://aws.amazon.com/documentation/s3/">S3</a>,

 * <a href="https://cloud.google.com/storage/">GCS</a>, etc).

 *

 * <p>The {@code RocksDBStateBackend} stores working state in <a href="http://rocksdb.org/">RocksDB</a>,

 * and checkpoints the state by default to a filesystem (similar to the {@code FsStateBackend}).

 *

 * <h2>Raw Bytes Storage and Backends</h2>

 *

 * The {@code StateBackend} creates services for <i>raw bytes storage</i> and for <i>keyed state</i>

 * and <i>operator state</i>.

 *

 * <p>The <i>raw bytes storage</i> (through the {@link CheckpointStreamFactory}) is the fundamental

 * service that simply stores bytes in a fault tolerant fashion. This service is used by the JobManager

 * to store checkpoint and recovery metadata and is typically also used by the keyed- and operator state

 * backends to store checkpointed state.

 *

 * <p>The {@link AbstractKeyedStateBackend} and {@link OperatorStateBackend} created by this state

 * backend define how to hold the working state for keys and operators. They also define how to checkpoint

 * that state, frequently using the raw bytes storage (via the {@code CheckpointStreamFactory}).

 * However, it is also possible that for example a keyed state backend simply implements the bridge to

 * a key/value store, and that it does not need to store anything in the raw byte storage upon a

 * checkpoint.

 *

 * <h2>Serializability</h2>

 *

 * State Backends need to be {@link java.io.Serializable serializable}, because they distributed

 * across parallel processes (for distributed execution) together with the streaming application code.

 *

 * <p>Because of that, {@code StateBackend} implementations (typically subclasses

 * of {@link AbstractStateBackend}) are meant to be like <i>factories</i> that create the proper

 * states stores that provide access to the persistent storage and hold the keyed- and operator

 * state data structures. That way, the State Backend can be very lightweight (contain only

 * configurations) which makes it easier to be serializable.

 *

 * <h2>Thread Safety</h2>

 *

 * State backend implementations have to be thread-safe. Multiple threads may be creating

 * streams and keyed-/operator state backends concurrently.

 */

4.Savepoint

Savepoint是Checkpoint的一种特殊实现，底层也是使用Checkpoint的机制。Savepoint是用户以手工命令的方式触发Checkpoint并将结果持久化到指定的存储里，其主要目的是帮助用户在升级和维护集群过程中保存系统的状态数据，避免因停机或者升级邓正常终止应用的操作而导致系统无法恢复到原有的计算状态，而无法实现Exactly-Once的语义保证。

/**

 * Savepoints are manually-triggered snapshots from which a program can be

 * resumed on submission.

 *

 * <p>In order to allow changes to the savepoint format between Flink versions,

 * we allow different savepoint implementations (see subclasses of this

 * interface).

 *

 * <p>Savepoints are serialized via a {@link SavepointSerializer}.

 */

5.Querable State

Queryable State，顾名思义，就是可查询的状态，表示这个状态，在流计算的过程中就可以被查询，而不像其他流计算框架，需要存储到外部系统中才能被查询。目前可查询的state主要针对partitionable state，如keyed state等。

简单来说，当用户在job中定义了queryable state之后，就可以在外部，通过QueryableStateClient，通过job id, state name, key来查询所对应的状态的实时的值。

5.1 QueryableStateClient

QueryableStateClient

/**

 * Client for querying Flink's managed state.

 *

 * <p>You can mark state as queryable via {@link StateDescriptor#setQueryable(String)}.

 * The state instance created from this descriptor will be published for queries when it's

 * created on the Task Managers and the location will be reported to the Job Manager.

 *

 * <p>The client connects to a {@code Client Proxy} running on a given Task Manager. The

 * proxy is the entry point of the client to the Flink cluster. It forwards the requests

 * of the client to the Job Manager and the required Task Manager, and forwards the final

 * response back the client.

 *

 * <p>The proxy, initially resolves the location of the requested KvState via the JobManager. Resolved

 * locations are cached. When the server address of the requested KvState instance is determined, the

 * client sends out a request to the server. The returned final answer is then forwarded to the Client.

 */

其查询的实现

    /**

     * Returns a future holding the serialized request result.

     *

     * @param jobId                     JobID of the job the queryable state

     *                                  belongs to

     * @param queryableStateName        Name under which the state is queryable

     * @param keyHashCode               Integer hash code of the key (result of

     *                                  a call to {@link Object#hashCode()}

     * @param serializedKeyAndNamespace Serialized key and namespace to query

     *                                  KvState instance with

     * @return Future holding the serialized result

     */

    private CompletableFuture<KvStateResponse> getKvState(

            final JobID jobId,

            final String queryableStateName,

            final int keyHashCode,

            final byte[] serializedKeyAndNamespace) {

        LOG.debug("Sending State Request to {}.", remoteAddress);

        try {

            KvStateRequest request = new KvStateRequest(jobId, queryableStateName, keyHashCode, serializedKeyAndNamespace);

            return client.sendRequest(remoteAddress, request);

        } catch (Exception e) {

            LOG.error("Unable to send KVStateRequest: ", e);

            return FutureUtils.getFailedFuture(e);

        }

    }

通过组装request，然后使用client发送请求

5.2 KvStateServer

KvStateServer

/**

 * An interface for the Queryable State Server running on each Task Manager in the cluster.

 * This server is responsible for serving requests coming from the {@link KvStateClientProxy

 * Queryable State Proxy} and requesting <b>locally</b> stored state.

 */

6. 总结

为什么要使用状态?

数据之间有关联，需要通过状态满足业务逻辑

为什么要管理状态?

实时计算作业需要7*24运行，需要应对不可靠因素带来的影响

如何选择状态的类型和存储方式?

分析自己的业务场景，比对各方案的利弊，选择合适的，够用即可

参考资料：

【1】https://yq.aliyun.com/articles/225623#

【2】https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/

【3】https://blog.csdn.net/alexdamiao/article/details/94043468

Flink中的状态与容错的更多相关文章

关于 Flink 状态与容错机制
Flink 作为新一代基于事件流的.真正意义上的流批一体的大数据处理引擎,正在逐渐得到广大开发者们的青睐.就从我自身的视角看,最近也是在数据团队把一些原本由 Flume.SparkStreaming. ...
9、flink的状态与容错
1.理解State(状态) 1.1.State 对象的状态 Flink中的状态:一般指一个具体的task/operator某时刻在内存中的状态(例如某属性的值) 注意:State和Checkpoint ...
第09讲：Flink 状态与容错
Flink系列文章第01讲:Flink 的应用场景和架构模型第02讲:Flink 入门程序 WordCount 和 SQL 实现第03讲:Flink 的编程模型与其他框架比较第04讲:Flin ...
Apache Flink中的广播状态实用指南
感谢英文原文作者:https://data-artisans.com/blog/a-practical-guide-to-broadcast-state-in-apache-flink 不过,原文最近 ...
Flink的状态与容错
本文主要运行到Flink以下内容检查点机制(CheckPoint) 状态管理器(StateBackend) 状态周期(StateTtlConfig) 关系首先要将state和checkpoint概 ...
Flink 从0到1学习 —— Flink 中如何管理配置？
前言如果你了解 Apache Flink 的话,那么你应该熟悉该如何像 Flink 发送数据或者如何从 Flink 获取数据.但是在某些情况下,我们需要将配置数据发送到 Flink 集群并从中接收一 ...
Flink原理（五）——容错机制
本文是博主阅读Flink官方文档以及<Flink基础教程>后结合自己理解所写,若有表达有误的地方欢迎大伙留言指出. 1. 前言流式计算分为有状态和无状态两种情况,所谓状态就是计算过程中 ...
Flink中案例学习--State与CheckPoint理解
1.State概念理解在Flink中,按照基本类型,对State做了以下两类的划分:Keyed State, Operator State. Keyed State:和Key有关的状态类型,它只能被 ...
Flink中接收端反压以及Credit机制 (源码分析)
先上一张图整体了解Flink中的反压可以看到每个task都会有自己对应的IG(inputgate)对接上游发送过来的数据和RS(resultPatation)对接往下游发送数据, 整个反压机制通 ...

随机推荐

linuxC动态内存泄漏追踪方法
C里面没有垃圾回收机制,有时候你申请了动态内存却忘记释放,这就尴尬了(你的程序扮演了强盗角色,有借有还才是好孩子).当你想找出内存泄露的地方时,有的投入海量的代码中,头痛不已.还好GNU C库提供了些 ...
Visual C++ 编译器自动假定带 .C 扩展名的文件是 C 文件而不是 C++ 文件，并且拒绝 C++ 语法和关键字（c语言只能在大括号最前面申明变量）
今天在编译OpenGL红宝书附带源码中的light.c文件时遇到一个诡异的问题: 如图light .c,在不做任何修改的情况编译OK.然而只要在某些地方写了可执行代码,则会无法通过编译器编译! (这几 ...
WPF中的资源（二） - 二进制资源
原文:WPF中的资源(二) - 二进制资源 WPF中的二进制资源,就是类似于MFC中在对话框程序中添加的图片.字符串等资源,程序在运行时将其转换成二进制,以供程序使用. 下面以将字符串转换成二进制为例 ...
【全面解禁!真正的Expression Blend实战开发技巧】第四章从最常用ButtonStyle开始 - PathButton
原文:[全面解禁!真正的Expression Blend实战开发技巧]第四章从最常用ButtonStyle开始 - PathButton 上一篇我们介绍了TextButton,但为了追求界面的张力, ...
UWP入门（十）--创建、写入和读取文件
原文:UWP入门(十)--创建.写入和读取文件核心的 API github代码 StorageFolder 类 StorageFile 类 FileIO 类使用 StorageFile 对象读取和 ...
IDHTTP-HTTPS通讯方法（Indy10.2.3之前仅支持OpenSSL的0.9.8以前的版本）
加载https的站点页面内容的时候记得在窗体上加上“IdSSLIOHandlerSocketOpenSSL1” 以及idhttp里面IOHandler为“IdSSLIOHandlerSocketOpe ...
MongoDB数据查询
启动MongoDB:sudo service mongodb start,mongo 经测试,键可加引号也可不加,但是值一般要加引号,数值类型除外 MongoDB区分大小写,命名通常采用驼峰式命名法 ...
MinGW 编译 libsndfile-1.0.25（只要有 MSYS，./configure make make install 就行了）
最近做的一个项目需要读写 wav 文件.在网上找到 libsndfile 刚好满足我的需要.但是编译的时候遇到了点小麻烦,这里记录一下编译的过程,免得下次再编译时忘记了. 因为是在编译完成若干天后写的 ...
基于 libevent 开发的 C++ 11 高性能网络服务器 evpp（360的作品）
evpp是一个基于libevent开发的现代化C++11高性能网络服务器,自带TCP/UDP/HTTP等协议的异步非阻塞式的服务器和客户端库. 特性: 现代版的C++11接口非阻塞异步接口都是C++ ...
JavaScript API for Office Outlook Add-in - “一页纸文档“
上一篇文章 Office Add-in Model 为 Outlook Mail Add-in 提供的 JavaScript API 介绍 ,简单地在表格中列出了所有的 Object 定义,但是个人感 ...

Flink中的状态与容错

Flink中的状态与容错的更多相关文章

随机推荐

热门专题