Flink - state
public class StreamTaskState implements Serializable, Closeable {
private static final long serialVersionUID = 1L;
private StateHandle<?> operatorState;
private StateHandle<Serializable> functionState;
private HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> kvStates;
Flink中state分为三种,
可以看到,StreamTaskState是对三种state的封装,
1. KVState
是最基本的state,
抽象是一对,KvState和KvStateSnapshot
通过两个接口,互相转化
/**
* Key/Value state implementation for user-defined state. The state is backed by a state
* backend, which typically follows one of the following patterns: Either the state is stored
* in the key/value state object directly (meaning in the executing JVM) and snapshotted by the
* state backend into some store (during checkpoints), or the key/value state is in fact backed
* by an external key/value store as the state backend, and checkpoints merely record the
* metadata of what is considered part of the checkpoint.
*
* @param <K> The type of the key.
* @param <N> The type of the namespace.
* @param <S> The type of {@link State} this {@code KvState} holds.
* @param <SD> The type of the {@link StateDescriptor} for state {@code S}.
* @param <Backend> The type of {@link AbstractStateBackend} that manages this {@code KvState}.
*/
public interface KvState<K, N, S extends State, SD extends StateDescriptor<S, ?>, Backend extends AbstractStateBackend> { /**
* Sets the current key, which will be used when using the state access methods.
*
* @param key The key.
*/
void setCurrentKey(K key); /**
* Sets the current namespace, which will be used when using the state access methods.
*
* @param namespace The namespace.
*/
void setCurrentNamespace(N namespace); /**
* Creates a snapshot of this state.
*
* @param checkpointId The ID of the checkpoint for which the snapshot should be created.
* @param timestamp The timestamp of the checkpoint.
* @return A snapshot handle for this key/value state.
*
* @throws Exception Exceptions during snapshotting the state should be forwarded, so the system
* can react to failed snapshots.
*/
KvStateSnapshot<K, N, S, SD, Backend> snapshot(long checkpointId, long timestamp) throws Exception; /**
* Disposes the key/value state, releasing all occupied resources.
*/
void dispose();
}
定义也比较简单,关键是snapshot接口,产生KvStateSnapshot
public interface KvStateSnapshot<K, N, S extends State, SD extends StateDescriptor<S, ?>, Backend extends AbstractStateBackend>
extends StateObject { /**
* Loads the key/value state back from this snapshot.
*
* @param stateBackend The state backend that created this snapshot and can restore the key/value state
* from this snapshot.
* @param keySerializer The serializer for the keys.
* @param classLoader The class loader for user-defined types.
*
* @return An instance of the key/value state loaded from this snapshot.
*
* @throws Exception Exceptions can occur during the state loading and are forwarded.
*/
KvState<K, N, S, SD, Backend> restoreState(
Backend stateBackend,
TypeSerializer<K> keySerializer,
ClassLoader classLoader) throws Exception;
}
KvStateSnapshot,对应于KvState,关键是restoreState接口
以具体的,FsState为例,
public abstract class AbstractFsState<K, N, SV, S extends State, SD extends StateDescriptor<S, ?>>
extends AbstractHeapState<K, N, SV, S, SD, FsStateBackend> {
可以看到AbstractFsState是继承AbstractHeapState的,因为对于FsState的状态也是cache在Heap中的,只是在snapshot的时候需要写文件
所以先看下AbstractHeapState,
/**
* Base class for partitioned {@link ListState} implementations that are backed by a regular
* heap hash map. The concrete implementations define how the state is checkpointed.
*
* @param <K> The type of the key.
* @param <N> The type of the namespace.
* @param <SV> The type of the values in the state.
* @param <S> The type of State
* @param <SD> The type of StateDescriptor for the State S
* @param <Backend> The type of the backend that snapshots this key/value state.
*/
public abstract class AbstractHeapState<K, N, SV, S extends State, SD extends StateDescriptor<S, ?>, Backend extends AbstractStateBackend>
implements KvState<K, N, S, SD, Backend>, State { /** Map containing the actual key/value pairs */
protected final HashMap<N, Map<K, SV>> state; //可以看到这里,多了个namespace的概念,避免key太容易重复 /** Serializer for the state value. The state value could be a List<V>, for example. */
protected final TypeSerializer<SV> stateSerializer; /** The serializer for the keys */
protected final TypeSerializer<K> keySerializer; /** The serializer for the namespace */
protected final TypeSerializer<N> namespaceSerializer; /** This holds the name of the state and can create an initial default value for the state. */
protected final SD stateDesc; //StateDescriptor,用于放一些state的信息,比如default值 /** The current key, which the next value methods will refer to */
protected K currentKey; /** The current namespace, which the access methods will refer to. */
protected N currentNamespace = null; /** Cache the state map for the current key. */
protected Map<K, SV> currentNSState; /**
* Creates a new empty key/value state.
*
* @param keySerializer The serializer for the keys.
* @param namespaceSerializer The serializer for the namespace.
* @param stateDesc The state identifier for the state. This contains name
* and can create a default state value.
*/
protected AbstractHeapState(TypeSerializer<K> keySerializer,
TypeSerializer<N> namespaceSerializer,
TypeSerializer<SV> stateSerializer,
SD stateDesc) {
this(keySerializer, namespaceSerializer, stateSerializer, stateDesc, new HashMap<N, Map<K, SV>>());
}
AbstractFsState
public abstract class AbstractFsState<K, N, SV, S extends State, SD extends StateDescriptor<S, ?>>
extends AbstractHeapState<K, N, SV, S, SD, FsStateBackend> { /** The file system state backend backing snapshots of this state */
private final FsStateBackend backend; public abstract KvStateSnapshot<K, N, S, SD, FsStateBackend> createHeapSnapshot(Path filePath); // @Override
public KvStateSnapshot<K, N, S, SD, FsStateBackend> snapshot(long checkpointId, long timestamp) throws Exception { try (FsStateBackend.FsCheckpointStateOutputStream out = backend.createCheckpointStateOutputStream(checkpointId, timestamp)) { // // serialize the state to the output stream
DataOutputViewStreamWrapper outView = new DataOutputViewStreamWrapper(new DataOutputStream(out));
outView.writeInt(state.size());
for (Map.Entry<N, Map<K, SV>> namespaceState: state.entrySet()) {
N namespace = namespaceState.getKey();
namespaceSerializer.serialize(namespace, outView);
outView.writeInt(namespaceState.getValue().size());
for (Map.Entry<K, SV> entry: namespaceState.getValue().entrySet()) {
keySerializer.serialize(entry.getKey(), outView);
stateSerializer.serialize(entry.getValue(), outView);
}
}
outView.flush(); //真实的内容是刷到文件的 // create a handle to the state
return createHeapSnapshot(out.closeAndGetPath()); //snapshot里面需要的只是path
}
}
}
对于kv state,也分为好几类,valuestate,liststate,reducestate,foldstate,
简单起见,先看valuestate
public class FsValueState<K, N, V>
extends AbstractFsState<K, N, V, ValueState<V>, ValueStateDescriptor<V>>
implements ValueState<V> { @Override
public V value() {
if (currentNSState == null) {
currentNSState = state.get(currentNamespace); //现初始化当前namespace的kv
}
if (currentNSState != null) {
V value = currentNSState.get(currentKey);
return value != null ? value : stateDesc.getDefaultValue(); //取出value,如果为null,从stateDesc中取出default
}
return stateDesc.getDefaultValue();
} @Override
public void update(V value) {
if (currentKey == null) {
throw new RuntimeException("No key available.");
} if (value == null) {
clear();
return;
} if (currentNSState == null) {
currentNSState = new HashMap<>();
state.put(currentNamespace, currentNSState);
} currentNSState.put(currentKey, value); //更新
} @Override
public KvStateSnapshot<K, N, ValueState<V>, ValueStateDescriptor<V>, FsStateBackend> createHeapSnapshot(Path filePath) {
return new Snapshot<>(getKeySerializer(), getNamespaceSerializer(), stateSerializer, stateDesc, filePath); //以文件路径,创建snapshot
}
继续看FsStateSnapshot
public abstract class AbstractFsStateSnapshot<K, N, SV, S extends State, SD extends StateDescriptor<S, ?>>
extends AbstractFileStateHandle implements KvStateSnapshot<K, N, S, SD, FsStateBackend> { public abstract KvState<K, N, S, SD, FsStateBackend> createFsState(FsStateBackend backend, HashMap<N, Map<K, SV>> stateMap); // @Override
public KvState<K, N, S, SD, FsStateBackend> restoreState(
FsStateBackend stateBackend,
final TypeSerializer<K> keySerializer,
ClassLoader classLoader) throws Exception { // state restore
ensureNotClosed(); try (FSDataInputStream inStream = stateBackend.getFileSystem().open(getFilePath())) {
// make sure the in-progress restore from the handle can be closed
registerCloseable(inStream); DataInputViewStreamWrapper inView = new DataInputViewStreamWrapper(inStream); final int numKeys = inView.readInt();
HashMap<N, Map<K, SV>> stateMap = new HashMap<>(numKeys); for (int i = 0; i < numKeys; i++) {
N namespace = namespaceSerializer.deserialize(inView);
final int numValues = inView.readInt();
Map<K, SV> namespaceMap = new HashMap<>(numValues);
stateMap.put(namespace, namespaceMap);
for (int j = 0; j < numValues; j++) {
K key = keySerializer.deserialize(inView);
SV value = stateSerializer.deserialize(inView);
namespaceMap.put(key, value);
}
} return createFsState(stateBackend, stateMap); //
}
catch (Exception e) {
throw new Exception("Failed to restore state from file system", e);
}
}
}
FsValueState内部实现的snapshot
public static class Snapshot<K, N, V> extends AbstractFsStateSnapshot<K, N, V, ValueState<V>, ValueStateDescriptor<V>> {
private static final long serialVersionUID = 1L;
public Snapshot(TypeSerializer<K> keySerializer,
TypeSerializer<N> namespaceSerializer,
TypeSerializer<V> stateSerializer,
ValueStateDescriptor<V> stateDescs,
Path filePath) {
super(keySerializer, namespaceSerializer, stateSerializer, stateDescs, filePath);
}
@Override
public KvState<K, N, ValueState<V>, ValueStateDescriptor<V>, FsStateBackend> createFsState(FsStateBackend backend, HashMap<N, Map<K, V>> stateMap) {
return new FsValueState<>(backend, keySerializer, namespaceSerializer, stateDesc, stateMap);
}
}
2. FunctionState
stateHandle对于KvState,更为通用一些
/**
* StateHandle is a general handle interface meant to abstract operator state fetching.
* A StateHandle implementation can for example include the state itself in cases where the state
* is lightweight or fetching it lazily from some external storage when the state is too large.
*/
public interface StateHandle<T> extends StateObject { /**
* This retrieves and return the state represented by the handle.
*
* @param userCodeClassLoader Class loader for deserializing user code specific classes
*
* @return The state represented by the handle.
* @throws java.lang.Exception Thrown, if the state cannot be fetched.
*/
T getState(ClassLoader userCodeClassLoader) throws Exception;
}
3. OperatorState,典型的是windowOperater的状态
OperatorState,也是用StateHandle作为,snapshot的抽象
看下这三种State如何做snapshot的
AbstractStreamOperator,看看和checkpoint相关的接口,可以看到只会snapshot KvState
@Override
public StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception {
// here, we deal with key/value state snapshots StreamTaskState state = new StreamTaskState(); if (stateBackend != null) {
HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> partitionedSnapshots =
stateBackend.snapshotPartitionedState(checkpointId, timestamp);
if (partitionedSnapshots != null) {
state.setKvStates(partitionedSnapshots);
}
} return state;
} @Override
@SuppressWarnings("rawtypes,unchecked")
public void restoreState(StreamTaskState state) throws Exception {
// restore the key/value state. the actual restore happens lazily, when the function requests
// the state again, because the restore method needs information provided by the user function
if (stateBackend != null) {
stateBackend.injectKeyValueStateSnapshots((HashMap)state.getKvStates());
}
} @Override
public void notifyOfCompletedCheckpoint(long checkpointId) throws Exception {
if (stateBackend != null) {
stateBackend.notifyOfCompletedCheckpoint(checkpointId);
}
}
AbstractUdfStreamOperator
public abstract class AbstractUdfStreamOperator<OUT, F extends Function> extends AbstractStreamOperator<OUT> implements OutputTypeConfigurable<OUT>
这个首先继承了AbstractStreamOperator,看下checkpoint相关的接口,
@Override
public StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception {
StreamTaskState state = super.snapshotOperatorState(checkpointId, timestamp); //先执行super的snapshotOperatorState,即Kv state的snapshot if (userFunction instanceof Checkpointed) {
@SuppressWarnings("unchecked")
Checkpointed<Serializable> chkFunction = (Checkpointed<Serializable>) userFunction; Serializable udfState;
try {
udfState = chkFunction.snapshotState(checkpointId, timestamp); //snapshot,function的状态
}
catch (Exception e) {
throw new Exception("Failed to draw state snapshot from function: " + e.getMessage(), e);
} if (udfState != null) {
try {
AbstractStateBackend stateBackend = getStateBackend();
StateHandle<Serializable> handle =
stateBackend.checkpointStateSerializable(udfState, checkpointId, timestamp); //调用stateBackend存储state,并返回snapshot
state.setFunctionState(handle);
}
catch (Exception e) {
throw new Exception("Failed to add the state snapshot of the function to the checkpoint: "
+ e.getMessage(), e);
}
}
} return state;
} @Override
public void restoreState(StreamTaskState state) throws Exception {
super.restoreState(state); StateHandle<Serializable> stateHandle = state.getFunctionState(); if (userFunction instanceof Checkpointed && stateHandle != null) {
@SuppressWarnings("unchecked")
Checkpointed<Serializable> chkFunction = (Checkpointed<Serializable>) userFunction; Serializable functionState = stateHandle.getState(getUserCodeClassloader());
if (functionState != null) {
try {
chkFunction.restoreState(functionState);
}
catch (Exception e) {
throw new Exception("Failed to restore state to function: " + e.getMessage(), e);
}
}
}
} @Override
public void notifyOfCompletedCheckpoint(long checkpointId) throws Exception {
super.notifyOfCompletedCheckpoint(checkpointId); if (userFunction instanceof CheckpointListener) {
((CheckpointListener) userFunction).notifyCheckpointComplete(checkpointId);
}
}
可以看到这个operater,会snapshot kv state,和udf中的function的state
WindowOperator,典型的operater state
public class WindowOperator<K, IN, ACC, OUT, W extends Window>
extends AbstractUdfStreamOperator<OUT, InternalWindowFunction<ACC, OUT, K, W>>
implements OneInputStreamOperator<IN, OUT>, Triggerable, InputTypeConfigurable
public StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception {
if (mergingWindowsByKey != null) {
TupleSerializer<Tuple2<W, W>> tupleSerializer = new TupleSerializer<>((Class) Tuple2.class, new TypeSerializer[] {windowSerializer, windowSerializer} );
ListStateDescriptor<Tuple2<W, W>> mergeStateDescriptor = new ListStateDescriptor<>("merging-window-set", tupleSerializer);
for (Map.Entry<K, MergingWindowSet<W>> key: mergingWindowsByKey.entrySet()) {
setKeyContext(key.getKey());
ListState<Tuple2<W, W>> mergeState = getStateBackend().getPartitionedState(null, VoidSerializer.INSTANCE, mergeStateDescriptor);
mergeState.clear();
key.getValue().persist(mergeState);
}
}
StreamTaskState taskState = super.snapshotOperatorState(checkpointId, timestamp);
AbstractStateBackend.CheckpointStateOutputView out =
getStateBackend().createCheckpointStateOutputView(checkpointId, timestamp);
snapshotTimers(out);
taskState.setOperatorState(out.closeAndGetHandle());
return taskState;
}
@Override
public void restoreState(StreamTaskState taskState) throws Exception {
super.restoreState(taskState);
final ClassLoader userClassloader = getUserCodeClassloader();
@SuppressWarnings("unchecked")
StateHandle<DataInputView> inputState = (StateHandle<DataInputView>) taskState.getOperatorState();
DataInputView in = inputState.getState(userClassloader);
restoreTimers(in);
}
Flink - state的更多相关文章
- Flink State 有可能代替数据库吗?
有状态的计算作为容错以及数据一致性的保证,是当今实时计算必不可少的特性之一,流行的实时计算引擎包括 Google Dataflow.Flink.Spark (Structure) Streaming. ...
- 一文了解Flink State Backends
原文链接: 一文了解Flink State Backends 当我们使用Flink进行流式计算时,通常会产生各种形式的中间结果,我们称之为State.有状态产生,就必然涉及到状态的存储,那么Flink ...
- Flink - state管理
在Flink – Checkpoint 没有描述了整个checkpoint的流程,但是对于如何生成snapshot和恢复snapshot的过程,并没有详细描述,这里补充 StreamOperato ...
- Flink State Backends (状态后端)
State Backends 的作用 有状态的流计算是Flink的一大特点,状态本质上是数据,数据是需要维护的,例如数据库就是维护数据的一种解决方案.State Backends 的作用就是用来维护S ...
- Flink State Rescale性能优化
背景 今天我们来聊一聊flink中状态rescale的性能优化.我们知道flink是一个支持带状态计算的引擎,其中的状态分为了operator state和 keyed state两类.简而言之ope ...
- Flink State的两张图
streamTask的invoke方法中,会循环去调用task上的每个operator的initializeState方法,在这个方法中,会真正创建除了savepointStream的其他三个对象, ...
- 从udaf谈flink的state
1.前言 本文主要基于实践过程中遇到的一系列问题,来详细说明Flink的状态后端是什么样的执行机制,以理解自定义函数应该怎么写比较合理,避免踩坑. 内容是基于Flink SQL的使用,主要说明自定义聚 ...
- Flink的高可用集群环境
Flink的高可用集群环境 Flink简介 Flink核心是一个流式的数据流执行引擎,其针对数据流的分布式计算提供了数据分布,数据通信以及容错机制等功能. 因现在主要Flink这一块做先关方面的学习, ...
- Apache-Flink深度解析-State
摘要: 实际问题 在流计算场景中,数据会源源不断的流入Apache Flink系统,每条数据进入Apache Flink系统都会触发计算.如果我们想进行一个Count聚合计算,那么每次触发计算是将历史 ...
随机推荐
- document.images、document.forms、doucument.links——>HTMLCollection
由于历史原因,HTMLDocument类定义了一些快捷属性来访问各种各样的节点.例如,images.forms.links等属性指向香味类似只读数组的<img>.<form>. ...
- Linux chkconfig 命令详解
一.简介 chkconfig 命令主要用来更新(启动或停止)和查询系统服务的运行级信息.谨记chkconfig不是立即自动禁止或激活一个服务,它只是简单的改变了符号连接. 二.安装&启动 us ...
- iOS学习04C语言数组
1.一维数组 数组:具有相同类型的成员组成的一组数据 1> 定义 元素:数组中存放的数据成为数组的元素 数组是构造类型,用{...}来给构造类型赋初始值,类型修饰符用来表示元素的类型 类 ...
- 配置FastDFS
一.安装 (一)下载FastDFS安装包 FastDFS官方论坛:http://www.csource.org 下载1:http://sourceforge.net/projects/fastdfs/ ...
- 【转】ssh登录慢,等待输入密码时间长的解决办法
http://youhuiba.net/2013/06/09/520.html 有时候在ssh远程登录到其他主机上时发现登录时间太长,要等待很久才会出现输入密码的提示,google了一下,发现主要有两 ...
- HDU 1576 (乘法逆元)
题目链接: http://acm.hdu.edu.cn/showproblem.php?pid=1576 题目大意:求(A/B)mod 9973.但是给出的A是mod形式n,n=A%9973. 解题思 ...
- gson 简要使用
http://www.cnblogs.com/chenlhuaf/archive/2011/05/01/gson_test.html 发现了google的gson,因为之前对于protocolbuf有 ...
- webpack练手项目之easySlide(二):代码分割(转)
在上一篇 webpack练手项目之easySlide(一):初探webpack 中我们一起为大家介绍了webpack的基本用法,使用webpack对前端代码进行模块化打包. 但是乍一看webpack ...
- ACM: 敌兵布阵 解题报告 -线段树
敌兵布阵 Time Limit:1000MS Memory Limit:32768KB 64bit IO Format:%I64d & %I64u Description Li ...
- NOIp 2013 #2 花匠 Label:爆0的Water
题目描述 花匠栋栋种了一排花,每株花都有自己的高度.花儿越长越大,也越来越挤.栋栋决定 把这排中的一部分花移走,将剩下的留在原地,使得剩下的花能有空间长大,同时,栋栋希 望剩下的花排列得比较别致. 具 ...