Flink - ShipStrategyType
对于DataStream,可以选择如下的Strategy,
/**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are broadcasted to every parallel instance of the next operation.
*
* @return The DataStream with broadcast partitioning set.
*/
public DataStream<T> broadcast() {
return setConnectionType(new BroadcastPartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are shuffled uniformly randomly to the next operation.
*
* @return The DataStream with shuffle partitioning set.
*/
@PublicEvolving
public DataStream<T> shuffle() {
return setConnectionType(new ShufflePartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are forwarded to the local subtask of the next operation.
*
* @return The DataStream with forward partitioning set.
*/
public DataStream<T> forward() {
return setConnectionType(new ForwardPartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are distributed evenly to instances of the next operation in a round-robin
* fashion.
*
* @return The DataStream with rebalance partitioning set.
*/
public DataStream<T> rebalance() {
return setConnectionType(new RebalancePartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output elements
* are distributed evenly to a subset of instances of the next operation in a round-robin
* fashion.
*
* <p>The subset of downstream operations to which the upstream operation sends
* elements depends on the degree of parallelism of both the upstream and downstream operation.
* For example, if the upstream operation has parallelism 2 and the downstream operation
* has parallelism 4, then one upstream operation would distribute elements to two
* downstream operations while the other upstream operation would distribute to the other
* two downstream operations. If, on the other hand, the downstream operation has parallelism
* 2 while the upstream operation has parallelism 4 then two upstream operations will
* distribute to one downstream operation while the other two upstream operations will
* distribute to the other downstream operations.
*
* <p>In cases where the different parallelisms are not multiples of each other one or several
* downstream operations will have a differing number of inputs from upstream operations.
*
* @return The DataStream with rescale partitioning set.
*/
@PublicEvolving
public DataStream<T> rescale() {
return setConnectionType(new RescalePartitioner<T>());
} /**
* Sets the partitioning of the {@link DataStream} so that the output values
* all go to the first instance of the next processing operator. Use this
* setting with care since it might cause a serious performance bottleneck
* in the application.
*
* @return The DataStream with shuffle partitioning set.
*/
@PublicEvolving
public DataStream<T> global() {
return setConnectionType(new GlobalPartitioner<T>());
}
逻辑都是由Partitoner来实现的,
BroadcastPartitioner
public class BroadcastPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
int[] returnArray;
boolean set;
int setNumber;
@Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,
int numberOfOutputChannels) {
if (set && setNumber == numberOfOutputChannels) {
return returnArray;
} else {
this.returnArray = new int[numberOfOutputChannels];
for (int i = 0; i < numberOfOutputChannels; i++) {
returnArray[i] = i;
}
set = true;
setNumber = numberOfOutputChannels;
return returnArray;
}
}
int[] returnArray, 数组,select的channel id
broadcast,要发到所有channel,所以returnArray要包含所有的channel id
ShufflePartitioner,随机选一个channel
public class ShufflePartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
private Random random = new Random();
private int[] returnArray = new int[1];
@Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,
int numberOfOutputChannels) {
returnArray[0] = random.nextInt(numberOfOutputChannels);
return returnArray;
}
ForwardPartitioner,对于forward,应该只有一个输出channel,所以就选第一个channel就可以
public class ForwardPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
private int[] returnArray = new int[] {0};
@Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record, int numberOfOutputChannels) {
return returnArray;
}
RebalancePartitioner,就是roundrobin,循环选择
public class RebalancePartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
private int[] returnArray = new int[] {-1};
@Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,
int numberOfOutputChannels) {
this.returnArray[0] = (this.returnArray[0] + 1) % numberOfOutputChannels;
return this.returnArray;
}
GlobalPartitioner,默认选第一个
public class GlobalPartitioner<T> extends StreamPartitioner<T> {
private static final long serialVersionUID = 1L;
private int[] returnArray = new int[] { 0 };
@Override
public int[] selectChannels(SerializationDelegate<StreamRecord<T>> record,
int numberOfOutputChannels) {
return returnArray;
}
在RecordWriter中,emit会调用selectChannels来选取channel
public void emit(T record) throws IOException, InterruptedException {
for (int targetChannel : channelSelector.selectChannels(record, numChannels)) {
sendToTarget(record, targetChannel);
}
}
Flink - ShipStrategyType的更多相关文章
- Flink架构,源码及debug
序 工作中用Flink做批量和流式处理有段时间了,感觉只看Flink文档是对Flink ProgramRuntime的细节描述不是很多, 程序员还是看代码最简单和有效.所以想写点东西,记录一下,如果能 ...
- apache flink 入门
配置环境 包括 JAVA_HOME jobmanager.rpc.address jobmanager.heap.mb 和 taskmanager.heap.mb taskmanager.number ...
- Flink 1.1 – ResourceManager
Flink resource manager的作用如图, FlinkResourceManager /** * * <h1>Worker allocation steps</h1 ...
- Apache Flink初接触
Apache Flink闻名已久,一直没有亲自尝试一把,这两天看了文档,发现在real-time streaming方面,Flink提供了更多高阶的实用函数. 用Apache Flink实现WordC ...
- Flink - InstanceManager
InstanceManager用于管理JobManager申请到的taskManager和slots资源 /** * Simple manager that keeps track of which ...
- Flink – window operator
参考, http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/ http://wuchong.me/blog/201 ...
- Flink – Trigger,Evictor
org.apache.flink.streaming.api.windowing.triggers; Trigger public abstract class Trigger<T, W e ...
- Flink - RocksDBStateBackend
如果要考虑易用性和效率,使用rocksDB来替代普通内存的kv是有必要的 有了rocksdb,可以range查询,可以支持columnfamily,可以各种压缩 但是rocksdb本身是一个库,是跑在 ...
- Flink - state管理
在Flink – Checkpoint 没有描述了整个checkpoint的流程,但是对于如何生成snapshot和恢复snapshot的过程,并没有详细描述,这里补充 StreamOperato ...
随机推荐
- Android开发(二十三)——Application
参考: [1] Android中Application类用法.http://www.cnblogs.com/renqingping/archive/2012/10/24/Application.htm ...
- MXNET:深度学习计算-GPU
mxnet的设备管理 MXNet 使用 context 来指定用来存储和计算的设备,例如可以是 CPU 或者 GPU.默认情况下,MXNet 会将数据创建在主内存,然后利用 CPU 来计算.在 MXN ...
- 【30集iCore3_ADP出厂源代码(ARM部分)讲解视频】30-12底层驱动之液晶画点驱动
视频简介:该视频介绍iCore3应用开发平台中液晶驱动的方法. 源视频包下载地址:链接:http://pan.baidu.com/s/1qXQoOQo 密码:gvgo 银杏科技优酷视频发布区:http ...
- 教你一招:修复win7 系统自带的截图工具损坏
这个问题经常见,原因是注册表没有导入. 修复很简单. 打开资源管理器,在C盘中搜索到 tpcps.dll ,在其中选一个右击,选择注册dll,然后截图工具就被修复了. 有时候便签也会出现类似问题,方法 ...
- linux中的信号机制
概述 Linux信号机制是在应用软件层次上对中断机制的一种模拟,信号提供了一种处理异步事件的方法,例如,终端用户输入中断键(ctrl+c),则会通过信号机制停止一个程序[1]. 这其实就是向那个程序( ...
- Android Studio下jni应用
最近在将一个小应用从eclipse开发迁移到android studio,程序中有native代码实现,在eclipse是靠Android.mk这么个mk文件来组织编译的,但到android stud ...
- 安卓程序代写 网上程序代写[原]Android应用的自动更新模块
软件的自动更新一般都与Splash界面绑定在一起, 由于需要维护的软件界面很复杂, 一个Activity中嵌入ViewPager, 并且逻辑比较复杂, 索性重新写一个Activity, 现在的软件都很 ...
- 如何使用ABBYY FineReader 12将JPEG文件转换成Word文档
日常工作中处理JPEG格式的图像文件时,有时需要转换成Word文档进行编辑,市场上应用而生了很多转换工具,相信不少人听说过OCR(光学字符识别)软件,可以用来转换图像文件,而在OCR软件中, ABBY ...
- MTK framework系统默认设置
Android 5.1 最新framework系统默认设置 一般默认位置:frameworks\base\packages\SettingsProvider\res\values\defaults.x ...
- [Algorithm] Asymptotic Growth Rate
f(n) 的形式 vs 判定形势 但,此题型过于简单,一般不出现在考题中. Extended: link Let's set n = 2^m, so m = log(n) T(n) = 2*T(n^( ...