Flink - ShipStrategyType

对于DataStream，可以选择如下的Strategy，

/**

     * Sets the partitioning of the {@link DataStream} so that the output elements

     * are broadcasted to every parallel instance of the next operation.

     *

     * @return The DataStream with broadcast partitioning set.

     */

    public DataStream<T> broadcast() {

        return setConnectionType(new BroadcastPartitioner<T>());

    }

    /**

     * Sets the partitioning of the {@link DataStream} so that the output elements

     * are shuffled uniformly randomly to the next operation.

     *

     * @return The DataStream with shuffle partitioning set.

     */

    @PublicEvolving

    public DataStream<T> shuffle() {

        return setConnectionType(new ShufflePartitioner<T>());

    }

    /**

     * Sets the partitioning of the {@link DataStream} so that the output elements

     * are forwarded to the local subtask of the next operation.

     *

     * @return The DataStream with forward partitioning set.

     */

    public DataStream<T> forward() {

        return setConnectionType(new ForwardPartitioner<T>());

    }

    /**

     * Sets the partitioning of the {@link DataStream} so that the output elements

     * are distributed evenly to instances of the next operation in a round-robin

     * fashion.

     *

     * @return The DataStream with rebalance partitioning set.

     */

    public DataStream<T> rebalance() {

        return setConnectionType(new RebalancePartitioner<T>());

    }

    /**

     * Sets the partitioning of the {@link DataStream} so that the output elements

     * are distributed evenly to a subset of instances of the next operation in a round-robin

     * fashion.

     *

     * <p>The subset of downstream operations to which the upstream operation sends

     * elements depends on the degree of parallelism of both the upstream and downstream operation.

     * For example, if the upstream operation has parallelism 2 and the downstream operation

     * has parallelism 4, then one upstream operation would distribute elements to two

     * downstream operations while the other upstream operation would distribute to the other

     * two downstream operations. If, on the other hand, the downstream operation has parallelism

     * 2 while the upstream operation has parallelism 4 then two upstream operations will

     * distribute to one downstream operation while the other two upstream operations will

     * distribute to the other downstream operations.

     *

     * <p>In cases where the different parallelisms are not multiples of each other one or several

     * downstream operations will have a differing number of inputs from upstream operations.

     *

     * @return The DataStream with rescale partitioning set.

     */

    @PublicEvolving

    public DataStream<T> rescale() {

        return setConnectionType(new RescalePartitioner<T>());

    }

    /**

     * Sets the partitioning of the {@link DataStream} so that the output values

     * all go to the first instance of the next processing operator. Use this

     * setting with care since it might cause a serious performance bottleneck

     * in the application.

     *

     * @return The DataStream with shuffle partitioning set.

     */

    @PublicEvolving

    public DataStream<T> global() {

        return setConnectionType(new GlobalPartitioner<T>());

    }

逻辑都是由Partitoner来实现的，

BroadcastPartitioner

public class BroadcastPartitioner<T> extends StreamPartitioner<T> {

    private static final long serialVersionUID = 1L;

    int[] returnArray;

    boolean set;

    int setNumber;

    @Override

    public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,

            int numberOfOutputChannels) {

        if (set && setNumber == numberOfOutputChannels) {

            return returnArray;

        } else {

            this.returnArray = new int[numberOfOutputChannels];

            for (int i = 0; i < numberOfOutputChannels; i++) {

                returnArray[i] = i;

            }

            set = true;

            setNumber = numberOfOutputChannels;

            return returnArray;

        }

    }

int[] returnArray, 数组，select的channel id

broadcast，要发到所有channel，所以returnArray要包含所有的channel id

ShufflePartitioner，随机选一个channel

public class ShufflePartitioner<T> extends StreamPartitioner<T> {

    private static final long serialVersionUID = 1L;

    private Random random = new Random();

    private int[] returnArray = new int[1];

    @Override

    public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,

            int numberOfOutputChannels) {

        returnArray[0] = random.nextInt(numberOfOutputChannels);

        return returnArray;

    }

ForwardPartitioner，对于forward，应该只有一个输出channel，所以就选第一个channel就可以

public class ForwardPartitioner<T> extends StreamPartitioner<T> {

    private static final long serialVersionUID = 1L;

    private int[] returnArray = new int[] {0};

    @Override

    public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record, int numberOfOutputChannels) {

        return returnArray;

    }

RebalancePartitioner，就是roundrobin，循环选择

public class RebalancePartitioner<T> extends StreamPartitioner<T> {

    private static final long serialVersionUID = 1L;

    private int[] returnArray = new int[] {-1};

    @Override

    public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,

            int numberOfOutputChannels) {

        this.returnArray[0] = (this.returnArray[0] + 1) % numberOfOutputChannels;

        return this.returnArray;

    }

GlobalPartitioner，默认选第一个

public class GlobalPartitioner<T> extends StreamPartitioner<T> {

    private static final long serialVersionUID = 1L;

    private int[] returnArray = new int[] { 0 };

    @Override

    public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,

            int numberOfOutputChannels) {

        return returnArray;

    }

在RecordWriter中，emit会调用selectChannels来选取channel

    public void emit(T record) throws IOException, InterruptedException {

        for (int targetChannel : channelSelector.selectChannels(record, numChannels)) {

            sendToTarget(record, targetChannel);

        }

    }

Flink - ShipStrategyType的更多相关文章

Flink架构，源码及debug
序工作中用Flink做批量和流式处理有段时间了,感觉只看Flink文档是对Flink ProgramRuntime的细节描述不是很多, 程序员还是看代码最简单和有效.所以想写点东西,记录一下,如果能 ...
apache flink 入门
配置环境包括 JAVA_HOME jobmanager.rpc.address jobmanager.heap.mb 和 taskmanager.heap.mb taskmanager.number ...
Flink 1.1 – ResourceManager
Flink resource manager的作用如图, FlinkResourceManager /** * * <h1>Worker allocation steps</h1 ...
Apache Flink初接触
Apache Flink闻名已久,一直没有亲自尝试一把,这两天看了文档,发现在real-time streaming方面,Flink提供了更多高阶的实用函数. 用Apache Flink实现WordC ...
Flink - InstanceManager
InstanceManager用于管理JobManager申请到的taskManager和slots资源 /** * Simple manager that keeps track of which ...
Flink – window operator
参考, http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/ http://wuchong.me/blog/201 ...
Flink – Trigger，Evictor
org.apache.flink.streaming.api.windowing.triggers; Trigger public abstract class Trigger<T, W e ...
Flink - RocksDBStateBackend
如果要考虑易用性和效率,使用rocksDB来替代普通内存的kv是有必要的有了rocksdb,可以range查询,可以支持columnfamily,可以各种压缩但是rocksdb本身是一个库,是跑在 ...
Flink - state管理
在Flink – Checkpoint 没有描述了整个checkpoint的流程,但是对于如何生成snapshot和恢复snapshot的过程,并没有详细描述,这里补充 StreamOperato ...

随机推荐

Swift 4迁移总结：喜忧参半，新的起点
Swift 4迁移总结:喜忧参半,新的起点每日一篇优秀博文这次Swift 3 到 4 的迁移代码要改动的地方比较少,花了一个下午的时间就完成了迁移.Swift 把原来 4.0 的目标从 ABI 稳 ...
Mac下不显示设备
使用命令行adb devices 试了下,没设备列表. 第一步: 查看usb设备信息在终端输入:system_profiler SPUSBDataType 可以查看连接的usb设备的信息 ...
【30集iCore3_ADP出厂源代码(ARM部分)讲解视频】30-7底层驱动之滴嗒定时器
视频简介:该视频介绍iCore3应用开发平台中的配置方法,以及在应用开发平台中的应用. 源视频包下载地址:链接:http://pan.baidu.com/s/1o7UuUwi 密码:14cx 银杏科技 ...
c# 使用GDAL处理大图
注意问题: 1.GDAL 使用官网生成好的dll,必须把Bin目录下的dll一并加到执行目录下去,否则会出错. 2. 用环境变量设置引用路径可以避免一大堆dll放一起.代码如下: /// <s ...
oracle存储过程遇到的问题
最近新的项目,会批量执行数据,用到了存储过程和函数,遇到的问题记录如下: 1.涉及大量数据,所以决定分批commit数据 2.out无论是存储过程还是函数,都会返回数据,当时当我们手动raise(抛出 ...
[Bayes] Point --> Hist: Estimate "π" by R
Verify the Monte Carlo sampling variability of "π". p = π/4 与所得 0.7854 比较接近,故满足 Central L ...
[GAN] Generative networks
中文版:https://zhuanlan.zhihu.com/p/27440393 原文版:https://www.oreilly.com/learning/generative-adversaria ...
MySQL存储写入速度慢分析
问题背景描述: 在MySQL中执行SQL语句,比如insert,贼慢,明明可能也就只是一行数据的插入,数据量很小,但是耗费的时间却很多,为什么? 一.存储结构分析 MySQL存储结构图: 解析: 1. ...
深入Java内存模型之阅读理解（2）
锁的释放-获取建立的happens before 关系锁是java并发编程中最重要的同步机制.锁除了让临界区互斥执行外,还可以让释放锁的线程向获取同一个锁的线程发送消息. 下面是锁释放-获取的示例代 ...
[Android] 基于 Linux 命令行构建 Android 应用（六）：Android 应用签名
Android 要求所有应用在安装前必须使用证书进行数字签名.Android 使用该证书来确定一个应用以及其作者身份,该证书不要求由证书发行机构颁发,因此 Android 应用经常使用自我签名的证书, ...

Flink - ShipStrategyType

Flink - ShipStrategyType的更多相关文章

随机推荐

热门专题