参考,Flink - Generating Timestamps / Watermarks

watermark,只有在有window的情况下才用到,所以在window operator前加上assignTimestampsAndWatermarks即可

不一定需要从source发出

1. 首先,source可以发出watermark

我们就看看kafka source的实现

    protected AbstractFetcher(
SourceContext<T> sourceContext,
List<KafkaTopicPartition> assignedPartitions,
SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic, //在创建KafkaConsumer的时候assignTimestampsAndWatermarks
SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
ProcessingTimeService processingTimeProvider,
long autoWatermarkInterval, //env.getConfig().setAutoWatermarkInterval()
ClassLoader userCodeClassLoader,
boolean useMetrics) throws Exception
{
//判断watermark的类型
if (watermarksPeriodic == null) {
if (watermarksPunctuated == null) {
// simple case, no watermarks involved
timestampWatermarkMode = NO_TIMESTAMPS_WATERMARKS;
} else {
timestampWatermarkMode = PUNCTUATED_WATERMARKS;
}
} else {
if (watermarksPunctuated == null) {
timestampWatermarkMode = PERIODIC_WATERMARKS;
} else {
throw new IllegalArgumentException("Cannot have both periodic and punctuated watermarks");
}
} // create our partition state according to the timestamp/watermark mode
this.allPartitions = initializePartitions(
assignedPartitions,
timestampWatermarkMode,
watermarksPeriodic, watermarksPunctuated,
userCodeClassLoader); // if we have periodic watermarks, kick off the interval scheduler
if (timestampWatermarkMode == PERIODIC_WATERMARKS) { //如果是定期发出WaterMark
KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[] parts =
(KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[]) allPartitions; PeriodicWatermarkEmitter periodicEmitter=
new PeriodicWatermarkEmitter(parts, sourceContext, processingTimeProvider, autoWatermarkInterval);
periodicEmitter.start();
}
}

FlinkKafkaConsumerBase

    public FlinkKafkaConsumerBase<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> assigner) {
checkNotNull(assigner); if (this.punctuatedWatermarkAssigner != null) {
throw new IllegalStateException("A punctuated watermark emitter has already been set.");
}
try {
ClosureCleaner.clean(assigner, true);
this.periodicWatermarkAssigner = new SerializedValue<>(assigner);
return this;
} catch (Exception e) {
throw new IllegalArgumentException("The given assigner is not serializable", e);
}
}

这个接口的核心函数,定义,如何提取Timestamp和生成Watermark的逻辑

public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> {
Watermark getCurrentWatermark();
}
public interface TimestampAssigner<T> extends Function {
long extractTimestamp(T element, long previousElementTimestamp);
}

如果在初始化KafkaConsumer的时候,没有assignTimestampsAndWatermarks,就不会产生watermark

可以看到watermark有两种,

PERIODIC_WATERMARKS,定期发送的watermark

PUNCTUATED_WATERMARKS,由element触发的watermark,比如有element的特征或某种类型的element来表示触发watermark,这样便于开发者来控制watermark

initializePartitions

case PERIODIC_WATERMARKS: {
@SuppressWarnings("unchecked")
KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[] partitions =
(KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[])
new KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[assignedPartitions.size()]; int pos = 0;
for (KafkaTopicPartition partition : assignedPartitions) {
KPH kafkaHandle = createKafkaPartitionHandle(partition); AssignerWithPeriodicWatermarks<T> assignerInstance =
watermarksPeriodic.deserializeValue(userCodeClassLoader); partitions[pos++] = new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
partition, kafkaHandle, assignerInstance);
} return partitions;
}

KafkaTopicPartitionStateWithPeriodicWatermarks

这个类里面最核心的函数,

    public long getTimestampForRecord(T record, long kafkaEventTimestamp) {
return timestampsAndWatermarks.extractTimestamp(record, kafkaEventTimestamp);
} public long getCurrentWatermarkTimestamp() {
Watermark wm = timestampsAndWatermarks.getCurrentWatermark();
if (wm != null) {
partitionWatermark = Math.max(partitionWatermark, wm.getTimestamp());
}
return partitionWatermark;
}

可以看到是调用你定义的AssignerWithPeriodicWatermarks来实现

PeriodicWatermarkEmitter

    private static class PeriodicWatermarkEmitter implements ProcessingTimeCallback {

        public void start() {
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //start定时器,定时触发
} @Override
public void onProcessingTime(long timestamp) throws Exception { //触发逻辑 long minAcrossAll = Long.MAX_VALUE;
for (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?> state : allPartitions) { //对于每个partitions // we access the current watermark for the periodic assigners under the state
// lock, to prevent concurrent modification to any internal variables
final long curr;
//noinspection SynchronizationOnLocalVariableOrMethodParameter
synchronized (state) {
curr = state.getCurrentWatermarkTimestamp(); //取出当前partition的WaterMark
} minAcrossAll = Math.min(minAcrossAll, curr); //求min,以partition中最小的partition作为watermark
} // emit next watermark, if there is one
if (minAcrossAll > lastWatermarkTimestamp) {
lastWatermarkTimestamp = minAcrossAll;
emitter.emitWatermark(new Watermark(minAcrossAll)); //emit
} // schedule the next watermark
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //重新设置timer
}
}

2. DataStream也可以设置定时发送Watermark

其实实现是加了个chain的TimestampsAndPeriodicWatermarksOperator

DataStream

   /**
* Assigns timestamps to the elements in the data stream and periodically creates
* watermarks to signal event time progress.
*
* <p>This method creates watermarks periodically (for example every second), based
* on the watermarks indicated by the given watermark generator. Even when no new elements
* in the stream arrive, the given watermark generator will be periodically checked for
* new watermarks. The interval in which watermarks are generated is defined in
* {@link ExecutionConfig#setAutoWatermarkInterval(long)}.
*
* <p>Use this method for the common cases, where some characteristic over all elements
* should generate the watermarks, or where watermarks are simply trailing behind the
* wall clock time by a certain amount.
*
* <p>For the second case and when the watermarks are required to lag behind the maximum
* timestamp seen so far in the elements of the stream by a fixed amount of time, and this
* amount is known in advance, use the
* {@link BoundedOutOfOrdernessTimestampExtractor}.
*
* <p>For cases where watermarks should be created in an irregular fashion, for example
* based on certain markers that some element carry, use the
* {@link AssignerWithPunctuatedWatermarks}.
*
* @param timestampAndWatermarkAssigner The implementation of the timestamp assigner and
* watermark generator.
* @return The stream after the transformation, with assigned timestamps and watermarks.
*
* @see AssignerWithPeriodicWatermarks
* @see AssignerWithPunctuatedWatermarks
* @see #assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
*/
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) { // match parallelism to input, otherwise dop=1 sources could lead to some strange
// behaviour: the watermark will creep along very slowly because the elements
// from the source go to each extraction operator round robin.
final int inputParallelism = getTransformation().getParallelism();
final AssignerWithPeriodicWatermarks<T> cleanedAssigner = clean(timestampAndWatermarkAssigner); TimestampsAndPeriodicWatermarksOperator<T> operator =
new TimestampsAndPeriodicWatermarksOperator<>(cleanedAssigner); return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
.setParallelism(inputParallelism);
}

TimestampsAndPeriodicWatermarksOperator

  public class TimestampsAndPeriodicWatermarksOperator<T>
extends AbstractUdfStreamOperator<T, AssignerWithPeriodicWatermarks<T>>
implements OneInputStreamOperator<T, T>, Triggerable { private transient long watermarkInterval;
private transient long currentWatermark; public TimestampsAndPeriodicWatermarksOperator(AssignerWithPeriodicWatermarks<T> assigner) {
super(assigner); //AbstractUdfStreamOperator(F userFunction)
this.chainingStrategy = ChainingStrategy.ALWAYS; //一定是chain
} @Override
public void open() throws Exception {
super.open(); currentWatermark = Long.MIN_VALUE;
watermarkInterval = getExecutionConfig().getAutoWatermarkInterval(); if (watermarkInterval > 0) {
registerTimer(System.currentTimeMillis() + watermarkInterval, this); //注册到定时器
}
} @Override
public void processElement(StreamRecord<T> element) throws Exception {
final long newTimestamp = userFunction.extractTimestamp(element.getValue(), //由element中基于AssignerWithPeriodicWatermarks提取时间戳
element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE); output.collect(element.replace(element.getValue(), newTimestamp)); //更新element的时间戳,再次发出
} @Override
public void trigger(long timestamp) throws Exception { //定时器触发trigger
// register next timer
Watermark newWatermark = userFunction.getCurrentWatermark(); //取得watermark
if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
currentWatermark = newWatermark.getTimestamp();
// emit watermark
output.emitWatermark(newWatermark); //发出watermark
} registerTimer(System.currentTimeMillis() + watermarkInterval, this); //重新注册到定时器
} @Override
public void processWatermark(Watermark mark) throws Exception {
// if we receive a Long.MAX_VALUE watermark we forward it since it is used
// to signal the end of input and to not block watermark progress downstream
if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
currentWatermark = Long.MAX_VALUE;
output.emitWatermark(mark); //forward watermark
}
}

可以看到在processElement会调用AssignerWithPeriodicWatermarks.extractTimestamp提取event time

然后更新StreamRecord的时间

然后在Window Operator中,

@Override
public void processElement(StreamRecord<IN> element) throws Exception {
final Collection<W> elementWindows = windowAssigner.assignWindows(
element.getValue(), element.getTimestamp(), windowAssignerContext);

会在windowAssigner.assignWindows时以element的timestamp作为assign时间

对于watermark的处理,参考,Flink – window operator

Flink - watermark生成的更多相关文章

  1. [源码分析] 从源码入手看 Flink Watermark 之传播过程

    [源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...

  2. Flink Program Guide (4) -- 时间戳和Watermark生成(DataStream API编程指导 -- For Java)

    时间戳和Watermark生成 本文翻译自Generating Timestamp / Watermarks --------------------------------------------- ...

  3. flink watermark介绍

    转发请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/7610412.html 一 概念 watermark是flink为了处理eventTime窗口计算提 ...

  4. Flink assignAscendingTimestamps 生成水印的三个重载方法

    先简单介绍一下Timestamp 和Watermark 的概念: 1. Timestamp和Watermark都是基于事件的时间字段生成的 2. Timestamp和Watermark是两个不同的东西 ...

  5. flink WaterMark之TumblingEventWindow

    1.WaterMark,翻译成水印或水位线,水印翻译更抽象,水位线翻译接地气. watermark是用于处理乱序事件的,通常用watermark机制结合window来实现. 流处理从事件产生,到流经s ...

  6. 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark

    生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...

  7. Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)

    本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...

  8. Flink的时间类型和watermark机制

    一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...

  9. 老板让阿粉学习 flink 中的 Watermark,现在他出教程了

    1 前言 在时间 Time 那一篇中,介绍了三种时间概念 Event.Ingestin 和 Process, 其中还简单介绍了乱序 Event Time 事件和它的解决方案 Watermark 水位线 ...

随机推荐

  1. JsonCpp 的使用

    JSON全称为JavaScript ObjectNotation,它是一种轻量级的数据交换格式,易于阅读.编写.解析.jsoncpp是c++解析JSON串常用的解析库之一. jsoncpp中主要的类: ...

  2. 一次性将多个文件夹批处理压缩成多个.rar

    超级简单.不用自己写.bat批处理. 1. 打开winrar,选中所有要压缩的文件夹 2. 菜单->commands->add files to achive 3. 选中Files tab ...

  3. 用Jmeter+Badboy+Fiddler做接口测试

    用Jmeter+Badboy+Fiddler做接口测试 2016-12-05 目录: 1 简介2 Badboy录制3 Jmeter打开Badboy脚本4 用Fiddler抓请求,补充完善脚本5 测试中 ...

  4. Selenium Web 自动化 - Selenium常用API

    Selenium Web 自动化 - Selenium常用API 2016-08-01 目录 1 对浏览器操作  1.1 用webdriver打开一个浏览器  1.2 最大化浏览器&关闭浏览器 ...

  5. 8个超实用的jQuery插件应用

    自jQuery诞生以来,jQuery社区都在不断地.自发地为jQuery创建许许多多功能不一的插件应用,很多jQuery插件非常实用,对我们的前端开发帮助相当大,不仅可以更完美的完成指定功能,而且节省 ...

  6. JUnit+Mockito结合测试Spring MVC Controller

    [本文出自天外归云的博客园] 概要简述 利用JUnit结合Mockito,再加上spingframework自带的一些方法,就可以组合起来对Spring MVC中的Controller层进行测试. 在 ...

  7. 【iCore4 双核心板_FPGA】例程十:FSMC总线通信实验——复用地址模式

    实验原理: STM32F767上自带FMC控制器,本实验将通过FMC总线的地址复用模式实现STM32与FPGA 之间通信,FPGA内部建立RAM块,FPGA桥接STM32和RAM块,本实验通过FSMC ...

  8. JVM:基础

    参考 温绍景-Java虚拟机基础

  9. H3C S5120清除console口密码

    1.开机启动交换机显示Press Ctrl-B to enter Extended Boot menu...0  字样迅速按Ctrl-B进入如下字符介面提示: Press Ctrl-B to ente ...

  10. Linux Ubuntu 能PING IP但不能PING主机域名的解决方法

    ------------------------------------------------------------------------------- vi /etc/nsswitch.con ...