聊聊flink的Async I/O
// This example implements the asynchronous request and callback with Futures that have the
// interface of Java 8's futures (which is the same one followed by Flink's Future)
/**
* An implementation of the 'AsyncFunction' that sends requests and sets the callback.
*/
class AsyncDatabaseRequest extends RichAsyncFunction<String, Tuple2<String, String>> {
/** The database specific client that can issue concurrent requests with callbacks */
private transient DatabaseClient client;
@Override
public void open(Configuration parameters) throws Exception {
client = new DatabaseClient(host, post, credentials);
}
@Override
public void close() throws Exception {
client.close();
}
@Override
public void asyncInvoke(String key, final ResultFuture<Tuple2<String, String>> resultFuture) throws Exception {
// issue the asynchronous request, receive a future for result
final Future<String> result = client.query(key);
// set the callback to be executed once the request by the client is complete
// the callback simply forwards the result to the result future
CompletableFuture.supplyAsync(new Supplier<String>() {
@Override
public String get() {
try {
return result.get();
} catch (InterruptedException | ExecutionException e) {
// Normally handled explicitly.
return null;
}
}
}).thenAccept( (String dbResult) -> {
resultFuture.complete(Collections.singleton(new Tuple2<>(key, dbResult)));
});
}
}
// create the original stream
DataStream<String> stream = ...;
// apply the async I/O transformation
DataStream<Tuple2<String, String>> resultStream =
AsyncDataStream.unorderedWait(stream, new AsyncDatabaseRequest(), 1000, TimeUnit.MILLISECONDS, 100);
本实例展示了flink Async I/O的基本用法,首先是实现AsyncFunction接口,用于编写异步请求逻辑及将结果或异常设置到resultFuture,然后就是使用AsyncDataStream的unorderedWait或orderedWait方法将AsyncFunction作用到DataStream作为transformation;AsyncDataStream的unorderedWait或orderedWait有两个关于async operation的参数,一个是timeout参数用于设置async的超时时间,一个是capacity参数用于指定同一时刻最大允许多少个(并发)async request在执行
AsyncFunction
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/async/AsyncFunction.java
/**
* A function to trigger Async I/O operation.
*
* <p>For each #asyncInvoke, an async io operation can be triggered, and once it has been done,
* the result can be collected by calling {@link ResultFuture#complete}. For each async
* operation, its context is stored in the operator immediately after invoking
* #asyncInvoke, avoiding blocking for each stream input as long as the internal buffer is not full.
*
* <p>{@link ResultFuture} can be passed into callbacks or futures to collect the result data.
* An error can also be propagate to the async IO operator by
* {@link ResultFuture#completeExceptionally(Throwable)}.
*
* <p>Callback www.michenggw.com example usage:
*
* <pre>{@code
* public class HBaseAsyncFunc implements AsyncFunction<String, String> {
*
* public void asyncInvoke(String row, ResultFuture<String> result) throws Exception {
* HBaseCallback cb = new HBaseCallback(result);
* Get get = new Get(Bytes.toBytes(row));
* hbase.asyncGet(get, cb);
* }
* }
* }</pre>
*
* <p>Future example www.mcyllpt.com usage:
*
* <pre>{@code
* public class HBaseAsyncFunc implements AsyncFunction<String, String> {
*
* public void asyncInvoke(String row, final ResultFuture<String> result) throws Exception {
* Get get = new Get(Bytes.toBytes(row));
* ListenableFuture<Result>www.gcyl158.com future = hbase.asyncGet(get);
* Futures.addCallback(future, new FutureCallback<Result>() {
* public void onSuccess(Result result) {
* List<String> ret = process(result);
* result.complete(ret);
* }
* public void onFailure(Throwable thrown) {
* result.completeExceptionally(thrown);
* }
* });
* }
* }
* }</pre>
*
* @param <IN> The type of the www.haitianguo.cn input elements.
* @param <OUT> The type of the returned elements.
*/
@PublicEvolving
public interface AsyncFunction<IN, OUT> extends Function, Serializable {
/**
* Trigger async operation for each stream input.
*
* @param input element coming from an upstream task
* @param resultFuture to be completed with the result data
* @exception Exception in case of a user code error. An exception will make the task fail and
* trigger fail-over process.
*/
void asyncInvoke(IN input, ResultFuture<www.fenghuang1999.com> resultFuture) throws Exception;
/**
* {@link AsyncFunction#asyncInvoke} timeout occurred.
* By default, the result future is exceptionally completed with a timeout exception.
*
* @param input element coming from an upstream task
* @param resultFuture to be completed with the result data
*/
default void timeout(IN input, ResultFuture<OUT> resultFuture) throws Exception {
resultFuture.completeExceptionally(
new TimeoutException("Async function call has timed out."));
}
}
AsyncFunction接口继承了Function,它定义了asyncInvoke方法以及一个default的timeout方法;asyncInvoke方法执行异步逻辑,然后通过ResultFuture.complete将结果设置到ResultFuture,如果异常则通过ResultFuture.completeExceptionally(Throwable)来传递到ResultFuture
RichAsyncFunction
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/async/RichAsyncFunction.java
@PublicEvolving
public abstract class RichAsyncFunction<IN, OUT> extends AbstractRichFunction implements AsyncFunction<IN, OUT> {
private static final long serialVersionUID = 3858030061138121840L;
@Override
public void setRuntimeContext(RuntimeContext runtimeContext) {
Preconditions.checkNotNull(runtimeContext);
if (runtimeContext instanceof IterationRuntimeContext) {
super.setRuntimeContext(
new RichAsyncFunctionIterationRuntimeContext(
(IterationRuntimeContext) runtimeContext));
} else {
super.setRuntimeContext(new RichAsyncFunctionRuntimeContext(runtimeContext));
}
}
@Override
public abstract void asyncInvoke(IN input, ResultFuture<OUT> resultFuture) throws Exception;
//......
}
RichAsyncFunction继承了AbstractRichFunction,同时声明实现AsyncFunction接口,它不没有实现asyncInvoke,交由子类实现;它覆盖了setRuntimeContext方法,这里使用RichAsyncFunctionRuntimeContext或者RichAsyncFunctionIterationRuntimeContext进行包装
RichAsyncFunctionRuntimeContext
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/async/RichAsyncFunction.java
/**
* A wrapper class for async function's {@link RuntimeContext}. The async function runtime
* context only supports basic operations which are thread safe. Consequently, state access,
* accumulators, broadcast variables and the distributed cache are disabled.
*/
private static class RichAsyncFunctionRuntimeContext implements RuntimeContext {
private final RuntimeContext runtimeContext;
RichAsyncFunctionRuntimeContext(RuntimeContext context) {
runtimeContext = Preconditions.checkNotNull(context);
}
@Override
public String getTaskName() {
return runtimeContext.getTaskName();
}
@Override
public MetricGroup getMetricGroup() {
return runtimeContext.getMetricGroup();
}
@Override
public int getNumberOfParallelSubtasks() {
return runtimeContext.getNumberOfParallelSubtasks();
}
@Override
public int getMaxNumberOfParallelSubtasks() {
return runtimeContext.getMaxNumberOfParallelSubtasks();
}
@Override
public int getIndexOfThisSubtask() {
return runtimeContext.getIndexOfThisSubtask();
}
@Override
public int getAttemptNumber() {
return runtimeContext.getAttemptNumber();
}
@Override
public String getTaskNameWithSubtasks() {
return runtimeContext.getTaskNameWithSubtasks();
}
@Override
public ExecutionConfig getExecutionConfig() {
return runtimeContext.getExecutionConfig();
}
@Override
public ClassLoader getUserCodeClassLoader() {
return runtimeContext.getUserCodeClassLoader();
}
// -----------------------------------------------------------------------------------
// Unsupported operations
// -----------------------------------------------------------------------------------
@Override
public DistributedCache getDistributedCache() {
throw new UnsupportedOperationException("Distributed cache is not supported in rich async functions.");
}
@Override
public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) {
throw new UnsupportedOperationException("State is not supported in rich async functions.");
}
@Override
public <T> ListState<T> getListState(ListStateDescriptor<T> stateProperties) {
throw new UnsupportedOperationException("State is not supported in rich async functions.");
}
@Override
public <T> ReducingState<T> getReducingState(ReducingStateDescriptor<T> stateProperties) {
throw new UnsupportedOperationException("State is not supported in rich async functions.");
}
@Override
public <IN, ACC, OUT> AggregatingState<IN, OUT> getAggregatingState(AggregatingStateDescriptor<IN, ACC, OUT> stateProperties) {
throw new UnsupportedOperationException("State is not supported in rich async functions.");
}
@Override
public <T, ACC> FoldingState<T, ACC> getFoldingState(FoldingStateDescriptor<T, ACC> stateProperties) {
throw new UnsupportedOperationException("State is not supported in rich async functions.");
}
@Override
public <UK, UV> MapState<UK, UV> getMapState(MapStateDescriptor<UK, UV> stateProperties) {
throw new UnsupportedOperationException("State is not supported in rich async functions.");
}
@Override
public <V, A extends Serializable> void addAccumulator(String name, Accumulator<V, A> accumulator) {
throw new UnsupportedOperationException("Accumulators are not supported in rich async functions.");
}
@Override
public <V, A extends Serializable> Accumulator<V, A> getAccumulator(String name) {
throw new UnsupportedOperationException("Accumulators are not supported in rich async functions.");
}
@Override
public Map<String, Accumulator<?, ?>> getAllAccumulators() {
throw new UnsupportedOperationException("Accumulators are not supported in rich async functions.");
}
@Override
public IntCounter getIntCounter(String name) {
throw new UnsupportedOperationException("Int counters are not supported in rich async functions.");
}
@Override
public LongCounter getLongCounter(String name) {
throw new UnsupportedOperationException("Long counters are not supported in rich async functions.");
}
@Override
public DoubleCounter getDoubleCounter(String name) {
throw new UnsupportedOperationException("Long counters are not supported in rich async functions.");
}
@Override
public Histogram getHistogram(String name) {
throw new UnsupportedOperationException("Histograms are not supported in rich async functions.");
}
@Override
public boolean hasBroadcastVariable(String name) {
throw new UnsupportedOperationException("Broadcast variables are not supported in rich async functions.");
}
@Override
public <RT> List<RT> getBroadcastVariable(String name) {
throw new UnsupportedOperationException("Broadcast variables are not supported in rich async functions.");
}
@Override
public <T, C> C getBroadcastVariableWithInitializer(String name, BroadcastVariableInitializer<T, C> initializer) {
throw new UnsupportedOperationException("Broadcast variables are not supported in rich async functions.");
}
}
RichAsyncFunctionRuntimeContext实现了RuntimeContext接口,它将一些方法代理给RuntimeContext,其余的Unsupported的方法都覆盖抛出UnsupportedOperationException
RichAsyncFunctionIterationRuntimeContext
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/functions/async/RichAsyncFunction.java
private static class RichAsyncFunctionIterationRuntimeContext extends RichAsyncFunctionRuntimeContext implements IterationRuntimeContext {
private final IterationRuntimeContext iterationRuntimeContext;
RichAsyncFunctionIterationRuntimeContext(IterationRuntimeContext iterationRuntimeContext) {
super(iterationRuntimeContext);
this.iterationRuntimeContext = Preconditions.checkNotNull(iterationRuntimeContext);
}
@Override
public int getSuperstepNumber() {
return iterationRuntimeContext.getSuperstepNumber();
}
// -----------------------------------------------------------------------------------
// Unsupported operations
// -----------------------------------------------------------------------------------
@Override
public <T extends Aggregator<?>> T getIterationAggregator(String name) {
throw new UnsupportedOperationException("Iteration aggregators are not supported in rich async functions.");
}
@Override
public <T extends Value> T getPreviousIterationAggregate(String name) {
throw new UnsupportedOperationException("Iteration aggregators are not supported in rich async functions.");
}
}
RichAsyncFunctionIterationRuntimeContext继承了RichAsyncFunctionRuntimeContext,实现了IterationRuntimeContext接口,它将getSuperstepNumber方法交由IterationRuntimeContext处理,然后覆盖getIterationAggregator、getPreviousIterationAggregate方法抛出UnsupportedOperationException
AsyncDataStream
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/AsyncDataStream.java
@PublicEvolving
public class AsyncDataStream {
/**
* Output mode for asynchronous operations.
*/
public enum OutputMode { ORDERED, UNORDERED }
private static final int DEFAULT_QUEUE_CAPACITY = 100;
private static <IN, OUT> SingleOutputStreamOperator<OUT> addOperator(
DataStream<IN> in,
AsyncFunction<IN, OUT> func,
long timeout,
int bufSize,
OutputMode mode) {
TypeInformation<OUT> outTypeInfo = TypeExtractor.getUnaryOperatorReturnType(
func,
AsyncFunction.class,
0,
1,
new int[]{1, 0},
in.getType(),
Utils.getCallLocationName(),
true);
// create transform
AsyncWaitOperator<IN, OUT> operator = new AsyncWaitOperator<>(
in.getExecutionEnvironment().clean(func),
timeout,
bufSize,
mode);
return in.transform("async wait operator", outTypeInfo, operator);
}
public static <IN, OUT> SingleOutputStreamOperator<OUT> unorderedWait(
DataStream<IN> in,
AsyncFunction<IN, OUT> func,
long timeout,
TimeUnit timeUnit,
int capacity) {
return addOperator(in, func, timeUnit.toMillis(timeout), capacity, OutputMode.UNORDERED);
}
public static <IN, OUT> SingleOutputStreamOperator<OUT> unorderedWait(
DataStream<IN> in,
AsyncFunction<IN, OUT> func,
long timeout,
TimeUnit timeUnit) {
return addOperator(
in,
func,
timeUnit.toMillis(timeout),
DEFAULT_QUEUE_CAPACITY,
OutputMode.UNORDERED);
}
public static <IN, OUT> SingleOutputStreamOperator<OUT> orderedWait(
DataStream<IN> in,
AsyncFunction<IN, OUT> func,
long timeout,
TimeUnit timeUnit,
int capacity) {
return addOperator(in, func, timeUnit.toMillis(timeout), capacity, OutputMode.ORDERED);
}
public static <IN, OUT> SingleOutputStreamOperator<OUT> orderedWait(
DataStream<IN> in,
AsyncFunction<IN, OUT> func,
long timeout,
TimeUnit timeUnit) {
return addOperator(
in,
func,
timeUnit.toMillis(timeout),
DEFAULT_QUEUE_CAPACITY,
OutputMode.ORDERED);
}
}
AsyncDataStream提供了unorderedWait、orderedWait两类方法来将AsyncFunction作用于DataStream
unorderedWait、orderedWait方法有带capacity参数的也有不带capacity参数的,不带capacity参数即默认使用DEFAULT_QUEUE_CAPACITY,即100;这些方法最后都是调用addOperator私有方法来实现,它使用的是AsyncWaitOperator;unorderedWait、orderedWait方法都带了timeout参数,用于指定等待async操作完成的超时时间
AsyncDataStream提供了两种OutputMode,其中UNORDERED是无序的,即一旦async操作完成就emit结果,当使用TimeCharacteristic.ProcessingTime的时候这种模式延迟最低、负载最低;ORDERED是有序的,即按element的输入顺序emit结果,为了保证有序operator需要缓冲数据,因而会造成一定的延迟及负载
小结
flink给外部数据访问提供了Asynchronous I/O的API,用于提升streaming的吞吐量,其基本使用就是定义一个实现AsyncFunction接口的function,然后使用AsyncDataStream的unorderedWait或orderedWait方法将AsyncFunction作用到DataStream作为transformation
AsyncFunction接口继承了Function,它定义了asyncInvoke方法以及一个default的timeout方法;asyncInvoke方法执行异步逻辑,然后通过ResultFuture.complete将结果或异常设置到ResultFuture,如果异常则通过ResultFuture.completeExceptionally(Throwable)来传递到ResultFuture;RichAsyncFunction继承了AbstractRichFunction,同时声明实现AsyncFunction接口,它不没有实现asyncInvoke,交由子类实现;它覆盖了setRuntimeContext方法,这里使用RichAsyncFunctionRuntimeContext或者RichAsyncFunctionIterationRuntimeContext进行包装
AsyncDataStream的unorderedWait或orderedWait有两个关于async operation的参数,一个是timeout参数用于设置async的超时时间,一个是capacity参数用于指定同一时刻最大允许多少个(并发)async request在执行;AsyncDataStream提供了两种OutputMode,其中UNORDERED是无序的,即一旦async操作完成就emit结果,当使用TimeCharacteristic.ProcessingTime的时候这种模式延迟最低、负载最低;ORDERED是有序的,即按element的输入顺序emit结果,为了保证有序operator需要缓冲数据,因而会造成一定的延迟及负载
聊聊flink的Async I/O的更多相关文章
- 聊聊flink的NetworkEnvironmentConfiguration
本文主要研究一下flink的NetworkEnvironmentConfiguration NetworkEnvironmentConfiguration flink-1.7.2/flink-runt ...
- 聊聊flink的AsyncWaitOperator
序本文主要研究一下flink的AsyncWaitOperator AsyncWaitOperatorflink-streaming-java_2.11-1.7.0-sources.jar!/org/a ...
- 简单聊聊ES6-Promise和Async
前言 本篇博文出至于我的github仓库:web-study,如果你觉得对你有帮助欢迎star,你们的点赞是我持续更新的动力,谢谢! 异步编程在前端开发中尤为常见,从最早的XHR,到后来的各种封装aj ...
- [case49]聊聊flink的checkpoint配置
序 本文主要研究下flink的checkpoint配置 实例 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecut ...
- 聊聊flink的CsvTableSource
序 本文主要研究一下flink的CsvTableSource TableSource flink-table_2.11-1.7.1-sources.jar!/org/apache/flink/tabl ...
- 聊聊flink Table的groupBy操作
本文主要研究一下flink Table的groupBy操作 Table.groupBy flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/tab ...
- 聊聊flink的log.file配置
本文主要研究一下flink的log.file配置 log4j.properties flink-release-1.6.2/flink-dist/src/main/flink-bin/conf/log ...
- 聊聊flink的BlobStoreService
序 本文主要研究一下flink的BlobStoreService BlobView flink-release-1.7.2/flink-runtime/src/main/java/org/apache ...
- [源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
随机推荐
- DevOps是一种文化,不是角色!
一.DevOps是一种文化,不是角色! 软件无处不在.在如今的世界里,每个主流公司/组织都和软件开发息息相关,并且公司需要向软件一样运作.更快且更敏捷,同时保证安全性和可靠性,这样的要求前所未有的强烈 ...
- devpi 快速入门:上传,测试,推送发行版
安装 devpi 客户端和服务器端 pip install -U devpi 这将安装devpi-client,devpi-server 和 devpi-web 三个Python PyPi包. 初始化 ...
- Django-建立网页
进入cmd模式做 django-admin startproject helloworld创建一个project,并命名helloworld,新生成的文件结构如下 输入python manage. ...
- 第一篇:一天学会MongoDB数据库之Python操作
本文仅仅学习使用,转自:https://www.cnblogs.com/suoning/p/6759367.html#3682005 里面新增了如果用用Python代码进行增删改查 什么是MongoD ...
- JMeter与WireShark
最近在学习JMeter,刚学了一点皮毛,就掉入了WireShark的坑,我发现在学习的道路上就是不断的给自己挖坑,之前在学习LoadRunner的道路上,遇到的坑更大,就单纯的安装LR就耗费了两个星期 ...
- Spring学习(3):IOC基础(转载)
一. IoC是什么 Ioc—Inversion of Control,即“控制反转”,不是什么技术,而是一种设计思想.在Java开发中,Ioc意味着将你设计好的对象交给容器控制,而不是传统的在你的对象 ...
- 洛谷【P1057】传球游戏
https://www.luogu.org/problemnew/show/P1057 题目描述 在体育课上, 老师带着同学们一起做传球游戏. 游戏规则是这样的: n 个同学站成一个圆圈, 其中的一个 ...
- Python基础知识-05-数据类型总结字典
python其他知识目录 1.一道题,选择商品的序号.程序员和用户各自面对的序号起始值 如有变量 googs = ['汽车','飞机','火箭'] 提示用户可供选择的商品: 0,汽车1,飞机2,火箭用 ...
- 2.openldap安装
1.安装步骤如下 获取软件包 安装软件包(rpm或者源码编译) 生产openldap配置文件及数据库文件 配置 添加目录树条目 加载slapd进程 验证 2.所需安装包说明 openldap,open ...
- Beta发布-----欢迎来怼团队
欢迎来怼项目小组—Beta发布展示 一.小组成员 队长:田继平 成员:葛美义,王伟东,姜珊,邵朔,阚博文 ,李圆圆 二.文案+美工展示 链接:http://www.cnblogs.com/js2017 ...