对于DataStream,可以选择如下的Strategy,

/**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are broadcasted to every parallel instance of the next operation.
*
* @return The DataStream with broadcast partitioning set.
*/
public DataStream<T> broadcast() {
return setConnectionType(new BroadcastPartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are shuffled uniformly randomly to the next operation.
*
* @return The DataStream with shuffle partitioning set.
*/
@PublicEvolving
public DataStream<T> shuffle() {
return setConnectionType(new ShufflePartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are forwarded to the local subtask of the next operation.
*
* @return The DataStream with forward partitioning set.
*/
public DataStream<T> forward() {
return setConnectionType(new ForwardPartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are distributed evenly to instances of the next operation in a round-robin
* fashion.
*
* @return The DataStream with rebalance partitioning set.
*/
public DataStream<T> rebalance() {
return setConnectionType(new RebalancePartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are distributed evenly to a subset of instances of the next operation in a round-robin
* fashion.
*
* <p>The subset of downstream operations to which the upstream operation sends
* elements depends on the degree of parallelism of both the upstream and downstream operation.
* For example, if the upstream operation has parallelism 2 and the downstream operation
* has parallelism 4, then one upstream operation would distribute elements to two
* downstream operations while the other upstream operation would distribute to the other
* two downstream operations. If, on the other hand, the downstream operation has parallelism
* 2 while the upstream operation has parallelism 4 then two upstream operations will
* distribute to one downstream operation while the other two upstream operations will
* distribute to the other downstream operations.
*
* <p>In cases where the different parallelisms are not multiples of each other one or several
* downstream operations will have a differing number of inputs from upstream operations.
*
* @return The DataStream with rescale partitioning set.
*/
@PublicEvolving
public DataStream<T> rescale() {
return setConnectionType(new RescalePartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output values
* all go to the first instance of the next processing operator. Use this
* setting with care since it might cause a serious performance bottleneck
* in the application.
*
* @return The DataStream with shuffle partitioning set.
*/
@PublicEvolving
public DataStream<T> global() {
return setConnectionType(new GlobalPartitioner<T>());
}

 

逻辑都是由Partitoner来实现的,

BroadcastPartitioner

public class BroadcastPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L; int[] returnArray;
boolean set;
int setNumber; @Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,
int numberOfOutputChannels) {
if (set && setNumber == numberOfOutputChannels) {
return returnArray;
} else {
this.returnArray = new int[numberOfOutputChannels];
for (int i = 0; i < numberOfOutputChannels; i++) {
returnArray[i] = i;
}
set = true;
setNumber = numberOfOutputChannels;
return returnArray;
}
}

int[] returnArray, 数组,select的channel id

broadcast,要发到所有channel,所以returnArray要包含所有的channel id

 

ShufflePartitioner,随机选一个channel

public class ShufflePartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L; private Random random = new Random(); private int[] returnArray = new int[1]; @Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,
int numberOfOutputChannels) {
returnArray[0] = random.nextInt(numberOfOutputChannels);
return returnArray;
}

 

ForwardPartitioner,对于forward,应该只有一个输出channel,所以就选第一个channel就可以

public class ForwardPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L; private int[] returnArray = new int[] {0}; @Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record, int numberOfOutputChannels) {
return returnArray;
}

 

RebalancePartitioner,就是roundrobin,循环选择

public class RebalancePartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L; private int[] returnArray = new int[] {-1}; @Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,
int numberOfOutputChannels) {
this.returnArray[0] = (this.returnArray[0] + 1) % numberOfOutputChannels;
return this.returnArray;
}

 

GlobalPartitioner,默认选第一个

public class GlobalPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L; private int[] returnArray = new int[] { 0 }; @Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,
int numberOfOutputChannels) {
return returnArray;
}

 

在RecordWriter中,emit会调用selectChannels来选取channel

    public void emit(T record) throws IOException, InterruptedException {
for (int targetChannel : channelSelector.selectChannels(record, numChannels)) {
sendToTarget(record, targetChannel);
}
}

Flink - ShipStrategyType的更多相关文章

  1. Flink架构,源码及debug

    序 工作中用Flink做批量和流式处理有段时间了,感觉只看Flink文档是对Flink ProgramRuntime的细节描述不是很多, 程序员还是看代码最简单和有效.所以想写点东西,记录一下,如果能 ...

  2. apache flink 入门

    配置环境 包括 JAVA_HOME jobmanager.rpc.address jobmanager.heap.mb 和 taskmanager.heap.mb taskmanager.number ...

  3. Flink 1.1 – ResourceManager

    Flink resource manager的作用如图,   FlinkResourceManager /** * * <h1>Worker allocation steps</h1 ...

  4. Apache Flink初接触

    Apache Flink闻名已久,一直没有亲自尝试一把,这两天看了文档,发现在real-time streaming方面,Flink提供了更多高阶的实用函数. 用Apache Flink实现WordC ...

  5. Flink - InstanceManager

    InstanceManager用于管理JobManager申请到的taskManager和slots资源 /** * Simple manager that keeps track of which ...

  6. Flink – window operator

      参考, http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/ http://wuchong.me/blog/201 ...

  7. Flink – Trigger,Evictor

    org.apache.flink.streaming.api.windowing.triggers;   Trigger public abstract class Trigger<T, W e ...

  8. Flink - RocksDBStateBackend

    如果要考虑易用性和效率,使用rocksDB来替代普通内存的kv是有必要的 有了rocksdb,可以range查询,可以支持columnfamily,可以各种压缩 但是rocksdb本身是一个库,是跑在 ...

  9. Flink - state管理

    在Flink – Checkpoint 没有描述了整个checkpoint的流程,但是对于如何生成snapshot和恢复snapshot的过程,并没有详细描述,这里补充   StreamOperato ...

随机推荐

  1. ES6,扩展运算符的用途

    ES6的扩展运算符可以说是非常使用的,在给多参数函数传参,替代Apply,合并数组,和解构配合进行赋值方面提供了很好的便利性. 扩展运算符就是三个点“...”,就是将实现了Iterator 接口的对象 ...

  2. linux每日命令(19):locate 命令

    locate 让使用者可以很快速的搜寻档案系统内是否有指定的档案.其方法是先建立一个包括系统内所有档案名称及路径的数据库,之后当寻找时就只需查询这个数据库,而不必实际深入档案系统之中了.在一般的 di ...

  3. 修改/dev/shm的大小

    修改/dev/shm的大小 修改/etc/fstab的这行: 默认的:tmpfs /dev/shm tmpfs defaults 0 0改成:tmpfs /dev/shm tmpfs defaults ...

  4. Git 合并多次 commit 、 删除某次 commit

    Git 合并多次 commit 有时候在一个分支的多次意义相近的 commit,会把整个提交历史搞得很混乱,此时可以将一部分的 commit 合并为一个 commit,以美化整个 commit 历史, ...

  5. 自定义progressdialog,改善用户体验

    自定义progressdialog,改善用户体验

  6. 机器学习&深度学习基础(目录)

    从业这么久了,做了很多项目,一直对机器学习的基础课程鄙视已久,现在回头看来,系统的基础知识整理对我现在思路的整理很有利,写完这个基础篇,开始把AI+cv的也总结完,然后把这么多年做的项目再写好总结. ...

  7. 【转载】Ubuntu安装之,硬盘分区

    关于分区 如果你只是简单地想用上Ubuntu,可以这样操作:1)如果你是直接将整个硬盘都用来装Ubuntu,机器上没有需要保存的数据,或者已经做好备份的情况下,可以直接在Ubuntu分区时选择“向导─ ...

  8. 驱动文件中只有cat/inf/dll文件,怎么安装

    网上下载了一个驱动,里面包含文件只有cat/inf/dll文件,怎么安装? 1.计算机-右键-管理-设备管理器,找到要装驱动的设备上 2.右键-更新驱动程序-浏览到本地的这个驱动文件夹 3.开始安装

  9. Ubuntu下安装和使用zookeeper和kafka

    1.在清华镜像站下载kafka_2.10-0.10.0.0.tgz 和 zookeeper-3.4.10.tar.gz 分别解压到/usr/local目录下 2.进入zookeeper目录,在conf ...

  10. Spark学习笔记——Spark上数据的获取、处理和准备

    数据获得的方式多种多样,常用的公开数据集包括: 1.UCL机器学习知识库:包括近300个不同大小和类型的数据集,可用于分类.回归.聚类和推荐系统任务.数据集列表位于:http://archive.ic ...