https://docs.google.com/document/d/1Lr9UYXEz6s6R_3PWg3bZQLF3upGaNEkc0rQCFSzaYDI/edit

 

// create the original stream
DataStream<String> stream = ...; // apply the async I/O transformation
DataStream<Tuple2<String, String>> resultStream =
AsyncDataStream.unorderedWait(stream, new AsyncDatabaseRequest(), 1000, TimeUnit.MILLISECONDS, 100);

 

AsyncDataStream

有一组接口,

unorderedWait
orderedWait

 

最终都是调用到,

addOperator(in, func, timeUnit.toMillis(timeout), capacity, OutputMode.ORDERED)

是否是ordered,只是最后一个参数不同

    private static <IN, OUT> SingleOutputStreamOperator<OUT> addOperator(
DataStream<IN> in,
AsyncFunction<IN, OUT> func,
long timeout,
int bufSize,
OutputMode mode) { TypeInformation<OUT> outTypeInfo =
TypeExtractor.getUnaryOperatorReturnType(func, AsyncFunction.class, false,
true, in.getType(), Utils.getCallLocationName(), true); // create transform
AsyncWaitOperator<IN, OUT> operator = new AsyncWaitOperator<>(
in.getExecutionEnvironment().clean(func),
timeout,
bufSize,
mode); return in.transform("async wait operator", outTypeInfo, operator);
}

 

AsyncWaitOperator

setup主要是初始化,任务队列

    @Override
public void setup(StreamTask<?, ?> containingTask, StreamConfig config, Output<StreamRecord<OUT>> output) {
super.setup(containingTask, config, output); // create the operators executor for the complete operations of the queue entries
this.executor = Executors.newSingleThreadExecutor(); //单线程的Executor,用于处理队列 switch (outputMode) {
case ORDERED:
queue = new OrderedStreamElementQueue(
capacity,
executor,
this);
break;
case UNORDERED:
queue = new UnorderedStreamElementQueue(
capacity,
executor,
this);
break;
default:
throw new IllegalStateException("Unknown async mode: " + outputMode + '.');
}
}

 

看下,OrderedStreamElementQueue

public class OrderedStreamElementQueue implements StreamElementQueue {

    /** Queue for the inserted StreamElementQueueEntries. */
private final ArrayDeque<StreamElementQueueEntry<?>> queue; //放所有的element @Override
public AsyncResult peekBlockingly() throws InterruptedException { //取
lock.lockInterruptibly(); try {
while (queue.isEmpty() || !queue.peek().isDone()) { //如果queue的第一个element没有完成
headIsCompleted.await(); //等锁,等他完成
} return queue.peek(); //如果完成就peek出来,注意peek是不会移除这个element的,所以需要poll
} finally {
lock.unlock();
}
} @Override
public AsyncResult poll() throws InterruptedException { //单独做poll
lock.lockInterruptibly(); try {
while (queue.isEmpty() || !queue.peek().isDone()) { //如果第一个没完成,等待
headIsCompleted.await();
} notFull.signalAll(); //poll后,队列一定不满,所以解锁notFull return queue.poll();
} finally {
lock.unlock();
}
} private <T> void addEntry(StreamElementQueueEntry<T> streamElementQueueEntry) { //put,tryput都是调用这个 queue.addLast(streamElementQueueEntry); //加到queue里面 streamElementQueueEntry.onComplete(new AcceptFunction<StreamElementQueueEntry<T>>() { //给element加上complete的callback,调用onCompleteHandler
@Override
public void accept(StreamElementQueueEntry<T> value) {
try {
onCompleteHandler(value);
}
}
}, executor);
} private void onCompleteHandler(StreamElementQueueEntry<?> streamElementQueueEntry) throws InterruptedException {
lock.lockInterruptibly(); try {
if (!queue.isEmpty() && queue.peek().isDone()) {
headIsCompleted.signalAll(); //放开锁,告诉大家我完成了
}
} finally {
lock.unlock();
}
}
}

对于queue主要就是,读取操作

这里取是分两步,先peek,再poll

 

open,主要是处理从snapshot中恢复的数据

并启动emiter

    @Override
public void open() throws Exception {
super.open(); // process stream elements from state, since the Emit thread will start as soon as all
// elements from previous state are in the StreamElementQueue, we have to make sure that the
// order to open all operators in the operator chain proceeds from the tail operator to the
// head operator.
if (recoveredStreamElements != null) {
for (StreamElement element : recoveredStreamElements.get()) { //处理从snapshot中恢复出的element
if (element.isRecord()) {
processElement(element.<IN>asRecord());
}
else if (element.isWatermark()) {
processWatermark(element.asWatermark());
}
else if (element.isLatencyMarker()) {
processLatencyMarker(element.asLatencyMarker());
}
else {
throw new IllegalStateException("Unknown record type " + element.getClass() +
" encountered while opening the operator.");
}
}
recoveredStreamElements = null;
} // create the emitter
this.emitter = new Emitter<>(checkpointingLock, output, queue, this); //创建Emitter // start the emitter thread
this.emitterThread = new Thread(emitter, "AsyncIO-Emitter-Thread (" + getOperatorName() + ')');
emitterThread.setDaemon(true);
emitterThread.start(); }

 

Emitter

    @Override
public void run() {
try {
while (running) {
LOG.debug("Wait for next completed async stream element result.");
AsyncResult streamElementEntry = streamElementQueue.peekBlockingly(); output(streamElementEntry);
}

从queue中peek数据,对于上面OrderedStreamElementQueue,只有完成的数据会被peek到

    private void output(AsyncResult asyncResult) throws InterruptedException {
if (asyncResult.isWatermark()) {
//......
} else {
AsyncCollectionResult<OUT> streamRecordResult = asyncResult.asResultCollection(); synchronized (checkpointLock) { //collect数据需要加checkpoint锁
LOG.debug("Output async stream element collection result."); try {
Collection<OUT> resultCollection = streamRecordResult.get(); if (resultCollection != null) {
for (OUT result : resultCollection) {
timestampedCollector.collect(result); //真正emit数据
}
}
} // remove the peeked element from the async collector buffer so that it is no longer
// checkpointed
streamElementQueue.poll(); //emit完可以将数据从queue中删除 // notify the main thread that there is again space left in the async collector
// buffer
checkpointLock.notifyAll();
}
}
}

可以看到当数据被emit后,才会从queue删除掉

 

processElement

    @Override
public void processElement(StreamRecord<IN> element) throws Exception {
final StreamRecordQueueEntry<OUT> streamRecordBufferEntry = new StreamRecordQueueEntry<>(element); //封装成StreamRecordQueueEntry if (timeout > 0L) {
// register a timeout for this AsyncStreamRecordBufferEntry
long timeoutTimestamp = timeout + getProcessingTimeService().getCurrentProcessingTime(); final ScheduledFuture<?> timerFuture = getProcessingTimeService().registerTimer( //开个定时器,到时间就会colloct一个超时异常
timeoutTimestamp,
new ProcessingTimeCallback() {
@Override
public void onProcessingTime(long timestamp) throws Exception {
streamRecordBufferEntry.collect(
new TimeoutException("Async function call has timed out."));
}
}); // Cancel the timer once we've completed the stream record buffer entry. This will remove
// the register trigger task
streamRecordBufferEntry.onComplete(new AcceptFunction<StreamElementQueueEntry<Collection<OUT>>>() { //在StreamRecordQueueEntry完成是触发删除这个定时器,这样就只有未完成的会触发定时器
@Override
public void accept(StreamElementQueueEntry<Collection<OUT>> value) {
timerFuture.cancel(true);
}
}, executor);
} addAsyncBufferEntry(streamRecordBufferEntry); //把StreamRecordQueueEntry加到queue中去 userFunction.asyncInvoke(element.getValue(), streamRecordBufferEntry); //调用用户定义的asyncInvoke
}

 

StreamRecordQueueEntry

public class StreamRecordQueueEntry<OUT> extends StreamElementQueueEntry<Collection<OUT>>
implements AsyncCollectionResult<OUT>, AsyncCollector<OUT> { /** Future containing the collection result. */
private final CompletableFuture<Collection<OUT>> resultFuture; @Override
public void collect(Collection<OUT> result) {
resultFuture.complete(result);
} @Override
public void collect(Throwable error) {
resultFuture.completeExceptionally(error);
}
}

前面在emitter里面判断,entry是否做完就看,resultFuture是否isDone

可以看到resultFuture只有在collect的时候才会被complete

当resultFuture.complete时,onComplete callback会被触发,

这个callback在OrderedStreamElementQueue.addEntry被注册上来,做的事也就是告诉大家headIsCompleted;这样随后Emitter可以把结果数据emit出去

 

最终调用到用户定义的,

userFunction.asyncInvoke

@Override
public void asyncInvoke(final String str, final AsyncCollector<Tuple2<String, String>> asyncCollector) throws Exception { // issue the asynchronous request, receive a future for result
Future<String> resultFuture = client.query(str); // set the callback to be executed once the request by the client is complete
// the callback simply forwards the result to the collector
resultFuture.thenAccept( (String result) -> { asyncCollector.collect(Collections.singleton(new Tuple2<>(str, result))); });
}
}

 

首先client必须是异步的,如果不是,没法返回Future,那需要自己用连接池实现

主要逻辑就是在resultFuture完成后,调用asyncCollector.collect把结果返回给element

Flink - Asynchronous I/O的更多相关文章

  1. Flink 原理(六)——异步I/O(asynchronous I/O)

    1.前言 本文是基于Flink官网上Asynchronous  I/O的介绍结合自己的理解写成的,若有不正确的欢迎大伙留言交流,谢谢! 2.Asynchronous  I/O简介 将Flink用于流计 ...

  2. Flink - RocksDBStateBackend

    如果要考虑易用性和效率,使用rocksDB来替代普通内存的kv是有必要的 有了rocksdb,可以range查询,可以支持columnfamily,可以各种压缩 但是rocksdb本身是一个库,是跑在 ...

  3. Flink - Checkpoint

    Flink在流上最大的特点,就是引入全局snapshot,   CheckpointCoordinator 做snapshot的核心组件为, CheckpointCoordinator /** * T ...

  4. Flink - FLIP

    https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals FLIP-1 : Fine Grained ...

  5. Flink Internals

    https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals   Memory Management (Batch API) In ...

  6. Flink资料(2)-- 数据流容错机制

    数据流容错机制 该文档翻译自Data Streaming Fault Tolerance,文档描述flink在流式数据流图上的容错机制. ------------------------------- ...

  7. Apache Flink 分布式执行

    Flink 的分布式执行过程包含两个重要的角色,master 和 worker,参与 Flink 程序执行的有多个进程,包括 Job Manager,Task Manager 以及 Job Clien ...

  8. Blink: How Alibaba Uses Apache Flink

    This is a guest post from Xiaowei Jiang, Senior Director of Alibaba’s search infrastructure team. Th ...

  9. 阿里巴巴开源的Asynchronous I/O Design and Implementation

    Motivation I/O access, for the most case, is a time-consuming process, making the TPS for single ope ...

随机推荐

  1. 20款有用的JavaScript和CSS库

    Web开发与设计已经发展达到了新的高峰.创建一个网站并不是那么简单了,因为它使用的是几年前.今天是时间创造有吸引力的网站,不仅吸引了游客的关注也让他们订婚.另外,和功能的有吸引力的功能添加到该网站不应 ...

  2. 模仿CountDownLatch类自定义倒时计时器

    简介 这里模仿CountDownLatch类自定义到时计时器,利用AQS模板中的尝试获得共享和释放共享 1.MyCountDownLatch package com.jacky; import com ...

  3. android:onClick都做了什么

    同步发表于 http://avenwu.net/viewinject/2015/01/28/android_onclick/ 相信大家都知道在layout里面可以给view写android:onCli ...

  4. Python实现无向图最短路径

    一心想学习算法,很少去真正静下心来去研究,前几天趁着周末去了解了最短路径的资料,用python写了一个最短路径算法.算法是基于带权无向图去寻找两个点之间的最短路径,数据存储用邻接矩阵记录.首先画出一幅 ...

  5. ORACLE 11.2.0.4 OCR VOTING DISK 模拟恢复场景

    ① 备份   ocrconfig -export 文件名   或者   ocrconfig -manualbackup   或者   找到备份      ocrconfig -local -showb ...

  6. Scala学习笔记——类型

    1.Option类型 Option类型可以有两种类型,一种是Some(x),一种是None对象 比如Scala的Map的get方法发现了指定键,返回Some(x),没有发现,返回None对象 2.列表 ...

  7. SFTP编辑linux文件 ——mac sublime text2 sftp

    llinux上编辑文件总是个头疼的事儿.mac上没有nodepad++和editplus,他们都有各自支持的sftp插件,editplus比较好 自然就带了,而notepad++需要另行安装. 下面介 ...

  8. osx 10.11 一键制作U盘傻瓜工具最新版 无需任何命令

    osx 10.11 最新版U盘制作工具   无需任何命令   纯傻瓜式  !!!只要把app下载下来放在应用程序  鼠标点点就可以做了... 下载地址:http://diskmakerx.com/do ...

  9. web.xml配置DispatcherServlet (***-servlert.xml)

    1. org.springframework.web.servlet.DispatcherServlet 所在jar包: <dependency> <groupId>org.s ...

  10. Kubernetes – Ingress

    用户在 Kubernetes 上部署的服务一般运行于私有网络,Pod和Service 提供了 hostPort,NodePort等参数用于暴露这些服务端口到K8S节点上,供使用者访问.这样的方法有明显 ...