Flink - ResultPartition

发送数据一般通过，collector.collect

public interface Collector<T> {

    /**

     * Emits a record.

     *

     * @param record The record to collect.

     */

    void collect(T record);

    /**

     * Closes the collector. If any data was buffered, that data will be flushed.

     */

    void close();

}

output继承，

public interface Output<T> extends Collector<T> {

    /**

     * Emits a {@link Watermark} from an operator. This watermark is broadcast to all downstream

     * operators.

     *

     * <p>A watermark specifies that no element with a timestamp lower or equal to the watermark

     * timestamp will be emitted in the future.

     */

    void emitWatermark(Watermark mark);

}

RecordWriterOutput

public class RecordWriterOutput<OUT> implements Output<StreamRecord<OUT>> {

    private StreamRecordWriter<SerializationDelegate<StreamElement>> recordWriter;

    private SerializationDelegate<StreamElement> serializationDelegate;

    @Override

    public void collect(StreamRecord<OUT> record) {

        serializationDelegate.setInstance(record);

        try {

            recordWriter.emit(serializationDelegate);

        }

        catch (Exception e) {

            throw new RuntimeException(e.getMessage(), e);

        }

    }

RecordWriter

public class RecordWriter<T extends IOReadableWritable> {

    protected final ResultPartitionWriter writer; //负责写入ResultPartition的writer

    private final ChannelSelector<T> channelSelector; //选择写入哪个channel，默认RoundRobinChannelSelector

    private final int numChannels;

    /** {@link RecordSerializer} per outgoing channel */

    private final RecordSerializer<T>[] serializers;

    public RecordWriter(ResultPartitionWriter writer) {

        this(writer, new RoundRobinChannelSelector<T>());

    }

    @SuppressWarnings("unchecked")

    public RecordWriter(ResultPartitionWriter writer, ChannelSelector<T> channelSelector) {

        this.writer = writer;

        this.channelSelector = channelSelector;

        this.numChannels = writer.getNumberOfOutputChannels(); //获取channel数

        /**

         * The runtime exposes a channel abstraction for the produced results

         * (see {@link ChannelSelector}). Every channel has an independent

         * serializer.

         */

        this.serializers = new SpanningRecordSerializer[numChannels];

        for (int i = 0; i < numChannels; i++) {

            serializers[i] = new SpanningRecordSerializer<T>(); //为每个channel初始化Serializer

        }

    }

    public void emit(T record) throws IOException, InterruptedException {

        for (int targetChannel : channelSelector.selectChannels(record, numChannels)) { //对于选中的channels

            // serialize with corresponding serializer and send full buffer

            RecordSerializer<T> serializer = serializers[targetChannel];

            synchronized (serializer) { //加锁，一条channel的serializer不能并发写

                SerializationResult result = serializer.addRecord(record);

                while (result.isFullBuffer()) { //buffer,即memorySegment已满

                    Buffer buffer = serializer.getCurrentBuffer(); //将buffer取出

                    if (buffer != null) {

                        writeBuffer(buffer, targetChannel, serializer); //将buffer写入

                    }

                    buffer = writer.getBufferProvider().requestBufferBlocking(); //申请新的buffer

                    result = serializer.setNextBuffer(buffer); //set新的buffer到serializer

                }

            }

        }

    }

writeBuffer

private void writeBuffer(

        Buffer buffer,

        int targetChannel,

        RecordSerializer<T> serializer) throws IOException {

    try {

        writer.writeBuffer(buffer, targetChannel);

    }

    finally {

        serializer.clearCurrentBuffer();

    }

}

可以看到写入和申请buffer都是通过ResultPartitionWriter

public final class ResultPartitionWriter implements EventListener<TaskEvent> {

    private final ResultPartition partition; //Result Partition

    private final TaskEventHandler taskEventHandler = new TaskEventHandler();

    public ResultPartitionWriter(ResultPartition partition) {

        this.partition = partition;

    }

    // ------------------------------------------------------------------------

    // Attributes

    // ------------------------------------------------------------------------

    public ResultPartitionID getPartitionId() {

        return partition.getPartitionId();

    }

    public BufferProvider getBufferProvider() {

        return partition.getBufferProvider();

    }

    public int getNumberOfOutputChannels() {

        return partition.getNumberOfSubpartitions();

    }

    // ------------------------------------------------------------------------

    // Data processing

    // ------------------------------------------------------------------------

    public void writeBuffer(Buffer buffer, int targetChannel) throws IOException {

        partition.add(buffer, targetChannel);

    }

}

而ResultPartitionWriter操作都通过ResultPartition， writerBuffer只是把buffer，add到partition

ResultPartition

初始化的过程，

task初始化ResultPartition，

// Produced intermediate result partitions

this.producedPartitions = new ResultPartition[partitions.size()];

this.writers = new ResultPartitionWriter[partitions.size()];

for (int i = 0; i < this.producedPartitions.length; i++) {

    ResultPartitionDeploymentDescriptor desc = partitions.get(i);

    ResultPartitionID partitionId = new ResultPartitionID(desc.getPartitionId(), executionId);

    this.producedPartitions[i] = new ResultPartition(

            taskNameWithSubtaskAndId,

            jobId,

            partitionId,

            desc.getPartitionType(),

            desc.getEagerlyDeployConsumers(),

            desc.getNumberOfSubpartitions(),

            networkEnvironment.getPartitionManager(),

            networkEnvironment.getPartitionConsumableNotifier(),

            ioManager,

            networkEnvironment.getDefaultIOMode());

    this.writers[i] = new ResultPartitionWriter(this.producedPartitions[i]);

}

在task.run中先到NetworkEnvironment中register，

network.registerTask(this);

这里做的主要的工作是，创建等同于subPartiton大小的localBuffer，并register到ResultPartition

bufferPool = networkBufferPool.createBufferPool(partition.getNumberOfSubpartitions(), false); //创建LocalPool，注意Reqired的segment数目是Subpartitions的数目，即一个subP一个segment

partition.registerBufferPool(bufferPool); //把localPool注册到ResultPartition

所以，

writer.getBufferProvider().requestBufferBlocking();

就是调用localBufferPool.requestBuffer

如果有availableMemorySegments就直接用

如果没有，

if (numberOfRequestedMemorySegments < currentPoolSize) {

    final MemorySegment segment = networkBufferPool.requestMemorySegment(); //如果还有可申请的，就去networkBufferPool申请

    if (segment != null) {

        numberOfRequestedMemorySegments++;

        availableMemorySegments.add(segment);

        continue;

    }

}

if (askToRecycle) { //如果不能申请新的，让owner去试图释放

    owner.releaseMemory(1);

}

if (isBlocking) { //实在不行，blocking等2秒

    availableMemorySegments.wait(2000);

}

public void releaseMemory(int toRelease) throws IOException {

    for (ResultSubpartition subpartition : subpartitions) {

        toRelease -= subpartition.releaseMemory(); //让subpartition去releaseMemory

        // Only release as much memory as needed

        if (toRelease <= 0) {

            break;

        }

    }

}

可以看到，如果在emit的时候，如果没有可用的segment，是会blocking等待的

对于pipelineSubpartition的release，什么都不会做，所以这里如果buffer没有被及时发送出去并回收，会不断的blocking等待

public int releaseMemory() {

    // The pipelined subpartition does not react to memory release requests. The buffers will be

    // recycled by the consuming task.

    return 0;

}

ResultPartition.add

public void add(Buffer buffer, int subpartitionIndex) throws IOException {

    boolean success = false;

    try {

        checkInProduceState();

        final ResultSubpartition subpartition = subpartitions[subpartitionIndex]; //取出index相应的ResultSubpartition

        synchronized (subpartition) {

            success = subpartition.add(buffer); //把buffer add到ResultSubpartition

            // Update statistics

            totalNumberOfBuffers++;

            totalNumberOfBytes += buffer.getSize();

        }

    }

    finally {

        if (success) {

            notifyPipelinedConsumers(); //通知ResultPartitionConsumableNotifier触发notifyPartitionConsumable

        }

        else {

            buffer.recycle(); //失败，回收此buffer

        }

    }

}

对于PipelinedSubpartition，add逻辑就是加入buffer

/**

 * A pipelined in-memory only subpartition, which can be consumed once.

 */

class PipelinedSubpartition extends ResultSubpartition {

    /**

     * A data availability listener. Registered, when the consuming task is faster than the

     * producing task.

     */

    private NotificationListener registeredListener; //来数据后，通知consuming

    /** The read view to consume this subpartition. */

    private PipelinedSubpartitionView readView; //read view

    /** All buffers of this subpartition. Access to the buffers is synchronized on this object. */

    final ArrayDeque<Buffer> buffers = new ArrayDeque<Buffer>(); //buffer队列

    PipelinedSubpartition(int index, ResultPartition parent) {

        super(index, parent);

    }

    @Override

    public boolean add(Buffer buffer) {

        checkNotNull(buffer);

        final NotificationListener listener;

        synchronized (buffers) {

            if (isReleased || isFinished) {

                return false;

            }

            // Add the buffer and update the stats

            buffers.add(buffer); //加入buffer队列

            updateStatistics(buffer);

            // Get the listener...

            listener = registeredListener;

            registeredListener = null;

        }

        // Notify the listener outside of the synchronized block

        if (listener != null) {

            listener.onNotification(); //触发listener

        }

        return true;

    }

NettyConnectionManager

@Override

    public void start(ResultPartitionProvider partitionProvider, TaskEventDispatcher taskEventDispatcher, NetworkBufferPool networkbufferPool)

            throws IOException {

        PartitionRequestProtocol partitionRequestProtocol =

                new PartitionRequestProtocol(partitionProvider, taskEventDispatcher, networkbufferPool);

        client.init(partitionRequestProtocol, bufferPool);

        server.init(partitionRequestProtocol, bufferPool);

    }

PartitionRequestProtocol

// +-------------------------------------------------------------------+

    // |                        SERVER CHANNEL PIPELINE                    |

    // |                                                                   |

    // |    +----------+----------+ (3) write  +----------------------+    |

    // |    | Queue of queues     +----------->| Message encoder      |    |

    // |    +----------+----------+            +-----------+----------+    |

    // |              /|\                                 \|/              |

    // |               | (2) enqueue                       |               |

    // |    +----------+----------+                        |               |

    // |    | Request handler     |                        |               |

    // |    +----------+----------+                        |               |

    // |              /|\                                  |               |

    // |               |                                   |               |

    // |    +----------+----------+                        |               |

    // |    | Message decoder     |                        |               |

    // |    +----------+----------+                        |               |

    // |              /|\                                  |               |

    // |               |                                   |               |

    // |    +----------+----------+                        |               |

    // |    | Frame decoder       |                        |               |

    // |    +----------+----------+                        |               |

    // |              /|\                                  |               |

    // +---------------+-----------------------------------+---------------+

    // |               | (1) client request               \|/

    // +---------------+-----------------------------------+---------------+

    // |               |                                   |               |

    // |       [ Socket.read() ]                    [ Socket.write() ]     |

    // |                                                                   |

    // |  Netty Internal I/O Threads (Transport Implementation)            |

    // +-------------------------------------------------------------------+

    @Override

    public ChannelHandler[] getServerChannelHandlers() {

        PartitionRequestQueue queueOfPartitionQueues = new PartitionRequestQueue();

        PartitionRequestServerHandler serverHandler = new PartitionRequestServerHandler(

                partitionProvider, taskEventDispatcher, queueOfPartitionQueues, networkbufferPool);

        return new ChannelHandler[] {

                messageEncoder,

                createFrameLengthDecoder(),

                messageDecoder,

                serverHandler,

                queueOfPartitionQueues

        };

    }

PartitionRequestServerHandler

ServerHandler会分配大小至少为1的bufferpool，因为后面是false，意思是如果networkbufferpool有多余的segment，会分配进来

public void channelRegistered(ChannelHandlerContext ctx) throws Exception {

    super.channelRegistered(ctx);

    bufferPool = networkBufferPool.createBufferPool(1, false);

}

protected void channelRead0(ChannelHandlerContext ctx, NettyMessage msg) throws Exception

PartitionRequest request = (PartitionRequest) msg;

LOG.debug("Read channel on {}: {}.", ctx.channel().localAddress(), request);

try {

    ResultSubpartitionView subpartition =

            partitionProvider.createSubpartitionView(

                    request.partitionId,

                    request.queueIndex,

                    bufferPool);

    outboundQueue.enqueue(subpartition, request.receiverId); //放入PartitionRequestQueue，进行发送

}

ResultPartitionManager继承自partitionProvider

调用ResultPartitionManager.createSubpartitionView

synchronized (registeredPartitions) {

    final ResultPartition partition = registeredPartitions.get(partitionId.getProducerId(),

            partitionId.getPartitionId());

    return partition.createSubpartitionView(subpartitionIndex, bufferProvider);

}

ResultPartition

public ResultSubpartitionView createSubpartitionView(int index, BufferProvider bufferProvider) throws IOException {

    int refCnt = pendingReferences.get();

    checkState(refCnt != -1, "Partition released.");

    checkState(refCnt > 0, "Partition not pinned.");

    ResultSubpartitionView readView = subpartitions[index].createReadView(bufferProvider);

    return readView;

}

pendingReferences的定义

/**

 * The total number of references to subpartitions of this result. The result partition can be

 * safely released, iff the reference count is zero. A reference count of -1 denotes that the

 * result partition has been released.

 */

private final AtomicInteger pendingReferences = new AtomicInteger();

PipelinedSubpartitionView

class PipelinedSubpartitionView implements ResultSubpartitionView {

    /** The subpartition this view belongs to. */

    private final PipelinedSubpartition parent;

    /** Flag indicating whether this view has been released. */

    private AtomicBoolean isReleased = new AtomicBoolean();

    PipelinedSubpartitionView(PipelinedSubpartition parent) {

        this.parent = checkNotNull(parent);

    }

    @Override

    public Buffer getNextBuffer() {

        synchronized (parent.buffers) {

            return parent.buffers.poll(); //从parent，PipelinedSubpartition，的buffers里面直接poll

        }

    }

    @Override

    public boolean registerListener(NotificationListener listener) {

        return !isReleased.get() && parent.registerListener(listener);

    }

    @Override

    public void notifySubpartitionConsumed() { //消费完释放该Subpartition

        releaseAllResources();

    }

    @Override

    public void releaseAllResources() {

        if (isReleased.compareAndSet(false, true)) {

            // The view doesn't hold any resources and the parent cannot be restarted. Therefore,

            // it's OK to notify about consumption as well.

            parent.onConsumedSubpartition();

        }

    }

PartitionRequestQueue在发送数据的时候，会调用getNextBuffer获取数据

发送完，即收到EndOfPartitionEvent后，调用notifySubpartitionConsumed

释放会调用到，PipelinedSubpartition.onConsumedSubpartition –> ResultPartition.onConsumedSubpartition

void onConsumedSubpartition(int subpartitionIndex) {

    if (isReleased.get()) { //如果已经释放了

        return;

    }

    int refCnt = pendingReferences.decrementAndGet(); //一个Subpartition消费完，减1

    if (refCnt == 0) { //如果所有subPartition消费完

        partitionManager.onConsumedPartition(this); //通知partitionManager，release这个partition

    }

}

PipelinedSubpartition中的buffer，何时被释放，放回localbufferpool

具体看下，PartitionRequestQueue的发送过程，

writeAndFlushNextMessageIfPossible

buffer = currentPartitionQueue.getNextBuffer();

BufferResponse resp = new BufferResponse(buffer, currentPartitionQueue.getSequenceNumber(), currentPartitionQueue.getReceiverId()); //将buffer封装成BufferResponse

if (!buffer.isBuffer() &&

        EventSerializer.fromBuffer(buffer, getClass().getClassLoader()).getClass() == EndOfPartitionEvent.class) { //如果收到partition结束event

    currentPartitionQueue.notifySubpartitionConsumed(); //通知

    currentPartitionQueue.releaseAllResources();  //释放所有资源

    markAsReleased(currentPartitionQueue.getReceiverId());

    currentPartitionQueue = null;

}

channel.writeAndFlush(resp).addListener(writeListener); //真正的发送，WriteAndFlushNextMessageIfPossibleListener会再次调用writeAndFlushNextMessageIfPossible，反复读取

在BufferResponse中，

@Override

ByteBuf write(ByteBufAllocator allocator) throws IOException {

    int length = 16 + 4 + 1 + 4 + buffer.getSize();

    ByteBuf result = null;

    try {

        result = allocateBuffer(allocator, ID, length);

        receiverId.writeTo(result);

        result.writeInt(sequenceNumber);

        result.writeBoolean(buffer.isBuffer());

        result.writeInt(buffer.getSize());

        result.writeBytes(buffer.getNioBuffer());

        return result;

    }

    catch (Throwable t) {

        if (result != null) {

            result.release();

        }

        throw new IOException(t);

    }

    finally {

        if (buffer != null) {

            buffer.recycle(); //回收buffer

        }

    }

}

Buffer

public void recycle() {

    synchronized (recycleLock) {

        if (--referenceCount == 0) {

            recycler.recycle(memorySegment);

        }

    }

}

LocalBufferPool

@Override

public void recycle(MemorySegment segment) {

    synchronized (availableMemorySegments) {

        if (isDestroyed || numberOfRequestedMemorySegments > currentPoolSize) {

            returnMemorySegment(segment);

        }

        else {

            EventListener<Buffer> listener = registeredListeners.poll();

            if (listener == null) {

                availableMemorySegments.add(segment); //没有listener，直接放回availableMemorySegments，下次使用

                availableMemorySegments.notify();

            }

            else {

                try {

                    listener.onEvent(new Buffer(segment, this)); //如果有listener，直接扔给listener处理

                }

                catch (Throwable ignored) {

                    availableMemorySegments.add(segment);

                    availableMemorySegments.notify();

                }

            }

        }

    }

}

Flink - ResultPartition的更多相关文章

Flink Internals
https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals Memory Management (Batch API) In ...
追源索骥：透过源码看懂Flink核心框架的执行流程
li,ol.inline>li{display:inline-block;padding-right:5px;padding-left:5px}dl{margin-bottom:20px}dt, ...
Flink的TaskManager启动(源码分析)
通过启动脚本已经找到了TaskManager 的启动类org.apache.flink.runtime.taskexecutor.TaskManagerRunner 来看一下它的main方法中最后被 ...
Flink的Job启动TaskManager端(源码分析)
前面说到了 Flink的JobManager启动(源码分析) 启动了TaskManager 然后 Flink的Job启动JobManager端(源码分析) 说到JobManager会将转化得到 ...
Flink task之间的数据交换
Flink中的数据交换是围绕着下面的原则设计的: 1.数据交换的控制流(即,为了启动交换而传递的消息)是由接收者发起的,就像原始的MapReduce一样. 2.用于数据交换的数据流,即通过电缆的实际数 ...
Apache Flink - 架构和拓扑
Flink结构: flink cli 解析本地环境配置,启动 ApplicationMaster 在 ApplicationMaster 中启动 JobManager 在 ApplicationMas ...
【转帖】两年Flink迁移之路：从standalone到on yarn，处理能力提升五倍
两年Flink迁移之路:从standalone到on yarn,处理能力提升五倍 https://segmentfault.com/a/1190000020209179 flink 1.7k 次阅读 ...
Flink整体执行流程
以Flink源码中自带的WordCount为例,执行的入口从用户程序的execute()函数入手,execute()的源码如下: public JobExecutionResult execute(S ...
Apache Flink 的迁移之路，2 年处理效果提升 5 倍
一.背景与痛点在 2017 年上半年以前,TalkingData 的 App Analytics 和 Game Analytics 两个产品,流式框架使用的是自研的 td-etl-framework ...

随机推荐

android sdk manager 代理设置
启动 Android SDK Manager ,打开主界面,依次选择「Tools」.「Options...」,弹出『Android SDK Manager - Settings』窗口: 在『Andro ...
Socket网络编程--epoll小结
以前使用的用于I/O多路复用为了方便就使用select函数,但select这个函数是有缺陷的.因为它所支持的并发连接数是有限的(一般小于1024),因为用户处理的数组是使用硬编码的.这个最大值为FD_ ...
SourceInsight: sourceInsight4.0 修改默认字体
快捷键 Alt + Y
(5) 电商场景下的常见业务SQL处理
1. 如何对评论进行分页展示一般情况下都是这样写 SELECT customer_id,title,content FROM product_comment WHERE audit_status = ...
vue深入相应式原理
Vue 最显著的特性之一便是不太引人注意的响应式系统(reactivity system).模型层(model)只是普通 JavaScript 对象,修改它则更新视图(view).这会让状态管理变得非 ...
c#.net基础
值类型:值类型的实例一般在线程的栈上分配引用类型:引用类型的实例在线程的托管堆上分配引用类型变量的Equals比较的是二者的引用地址而不是内部的值,值类型变量的Equals方法比较的是二者的值. ...
Git之右键没有Git Bash Here的解决办法
1.Win+R 打开运行输入regedit 回车打开注册表 2.找到[HKEY_CLASSES_ROOT\Directory\Background]. 3.在[Background]下如果没有[she ...
【转】WPF自定义控件与样式(5)-Calendar/DatePicker日期控件自定义样式及扩展
一．前言申明:WPF自定义控件与样式是一个系列文章,前后是有些关联的,但大多是按照由简到繁的顺序逐步发布的等. 本文主要内容: 日历控件Calendar自定义样式: 日期控件DatePicker自定 ...
飞鱼48小时游戏创作嘉年华_厦门Pitch Time总结与收获
一.48小时游戏开发前期准备 1,策划明确美术队友和程序队友的水平,提需求的过程中尝试做减法,在保留核心玩法的基础上,看队友水平和时间判断是否添加需求. 策划是整个游戏团队的灵魂,也是开发的上限所在 ...
JAVA 构造器, extends[继承], implements[实现], Interface[接口], reflect[反射], clone[克隆], final, static, abstrac
记录一下: 构造器[构造函数]: 在java中如果用户编写类的时候没有提供构造函数,那么编译器会自动提供一个默认构造函数.它会把所有的实例字段设置为默认值:所有的数字变量初始化为0;所有的布尔变量设置 ...

Flink - ResultPartition

Flink - ResultPartition的更多相关文章

随机推荐

热门专题