Flink - watermark生成
参考,Flink - Generating Timestamps / Watermarks
watermark,只有在有window的情况下才用到,所以在window operator前加上assignTimestampsAndWatermarks即可
不一定需要从source发出
1. 首先,source可以发出watermark
我们就看看kafka source的实现
protected AbstractFetcher(
SourceContext<T> sourceContext,
List<KafkaTopicPartition> assignedPartitions,
SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic, //在创建KafkaConsumer的时候assignTimestampsAndWatermarks
SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
ProcessingTimeService processingTimeProvider,
long autoWatermarkInterval, //env.getConfig().setAutoWatermarkInterval()
ClassLoader userCodeClassLoader,
boolean useMetrics) throws Exception
{
//判断watermark的类型
if (watermarksPeriodic == null) {
if (watermarksPunctuated == null) {
// simple case, no watermarks involved
timestampWatermarkMode = NO_TIMESTAMPS_WATERMARKS;
} else {
timestampWatermarkMode = PUNCTUATED_WATERMARKS;
}
} else {
if (watermarksPunctuated == null) {
timestampWatermarkMode = PERIODIC_WATERMARKS;
} else {
throw new IllegalArgumentException("Cannot have both periodic and punctuated watermarks");
}
} // create our partition state according to the timestamp/watermark mode
this.allPartitions = initializePartitions(
assignedPartitions,
timestampWatermarkMode,
watermarksPeriodic, watermarksPunctuated,
userCodeClassLoader); // if we have periodic watermarks, kick off the interval scheduler
if (timestampWatermarkMode == PERIODIC_WATERMARKS) { //如果是定期发出WaterMark
KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[] parts =
(KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[]) allPartitions; PeriodicWatermarkEmitter periodicEmitter=
new PeriodicWatermarkEmitter(parts, sourceContext, processingTimeProvider, autoWatermarkInterval);
periodicEmitter.start();
}
}
FlinkKafkaConsumerBase
public FlinkKafkaConsumerBase<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> assigner) {
checkNotNull(assigner);
if (this.punctuatedWatermarkAssigner != null) {
throw new IllegalStateException("A punctuated watermark emitter has already been set.");
}
try {
ClosureCleaner.clean(assigner, true);
this.periodicWatermarkAssigner = new SerializedValue<>(assigner);
return this;
} catch (Exception e) {
throw new IllegalArgumentException("The given assigner is not serializable", e);
}
}
这个接口的核心函数,定义,如何提取Timestamp和生成Watermark的逻辑
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> {
Watermark getCurrentWatermark();
}
public interface TimestampAssigner<T> extends Function {
long extractTimestamp(T element, long previousElementTimestamp);
}
如果在初始化KafkaConsumer的时候,没有assignTimestampsAndWatermarks,就不会产生watermark
可以看到watermark有两种,
PERIODIC_WATERMARKS,定期发送的watermark
PUNCTUATED_WATERMARKS,由element触发的watermark,比如有element的特征或某种类型的element来表示触发watermark,这样便于开发者来控制watermark
initializePartitions
case PERIODIC_WATERMARKS: {
@SuppressWarnings("unchecked")
KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[] partitions =
(KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[])
new KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[assignedPartitions.size()];
int pos = 0;
for (KafkaTopicPartition partition : assignedPartitions) {
KPH kafkaHandle = createKafkaPartitionHandle(partition);
AssignerWithPeriodicWatermarks<T> assignerInstance =
watermarksPeriodic.deserializeValue(userCodeClassLoader);
partitions[pos++] = new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
partition, kafkaHandle, assignerInstance);
}
return partitions;
}
KafkaTopicPartitionStateWithPeriodicWatermarks
这个类里面最核心的函数,
public long getTimestampForRecord(T record, long kafkaEventTimestamp) {
return timestampsAndWatermarks.extractTimestamp(record, kafkaEventTimestamp);
}
public long getCurrentWatermarkTimestamp() {
Watermark wm = timestampsAndWatermarks.getCurrentWatermark();
if (wm != null) {
partitionWatermark = Math.max(partitionWatermark, wm.getTimestamp());
}
return partitionWatermark;
}
可以看到是调用你定义的AssignerWithPeriodicWatermarks来实现
PeriodicWatermarkEmitter
private static class PeriodicWatermarkEmitter implements ProcessingTimeCallback {
public void start() {
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //start定时器,定时触发
}
@Override
public void onProcessingTime(long timestamp) throws Exception { //触发逻辑
long minAcrossAll = Long.MAX_VALUE;
for (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?> state : allPartitions) { //对于每个partitions
// we access the current watermark for the periodic assigners under the state
// lock, to prevent concurrent modification to any internal variables
final long curr;
//noinspection SynchronizationOnLocalVariableOrMethodParameter
synchronized (state) {
curr = state.getCurrentWatermarkTimestamp(); //取出当前partition的WaterMark
}
minAcrossAll = Math.min(minAcrossAll, curr); //求min,以partition中最小的partition作为watermark
}
// emit next watermark, if there is one
if (minAcrossAll > lastWatermarkTimestamp) {
lastWatermarkTimestamp = minAcrossAll;
emitter.emitWatermark(new Watermark(minAcrossAll)); //emit
}
// schedule the next watermark
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //重新设置timer
}
}
2. DataStream也可以设置定时发送Watermark
其实实现是加了个chain的TimestampsAndPeriodicWatermarksOperator
DataStream
/**
* Assigns timestamps to the elements in the data stream and periodically creates
* watermarks to signal event time progress.
*
* <p>This method creates watermarks periodically (for example every second), based
* on the watermarks indicated by the given watermark generator. Even when no new elements
* in the stream arrive, the given watermark generator will be periodically checked for
* new watermarks. The interval in which watermarks are generated is defined in
* {@link ExecutionConfig#setAutoWatermarkInterval(long)}.
*
* <p>Use this method for the common cases, where some characteristic over all elements
* should generate the watermarks, or where watermarks are simply trailing behind the
* wall clock time by a certain amount.
*
* <p>For the second case and when the watermarks are required to lag behind the maximum
* timestamp seen so far in the elements of the stream by a fixed amount of time, and this
* amount is known in advance, use the
* {@link BoundedOutOfOrdernessTimestampExtractor}.
*
* <p>For cases where watermarks should be created in an irregular fashion, for example
* based on certain markers that some element carry, use the
* {@link AssignerWithPunctuatedWatermarks}.
*
* @param timestampAndWatermarkAssigner The implementation of the timestamp assigner and
* watermark generator.
* @return The stream after the transformation, with assigned timestamps and watermarks.
*
* @see AssignerWithPeriodicWatermarks
* @see AssignerWithPunctuatedWatermarks
* @see #assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
*/
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) { // match parallelism to input, otherwise dop=1 sources could lead to some strange
// behaviour: the watermark will creep along very slowly because the elements
// from the source go to each extraction operator round robin.
final int inputParallelism = getTransformation().getParallelism();
final AssignerWithPeriodicWatermarks<T> cleanedAssigner = clean(timestampAndWatermarkAssigner); TimestampsAndPeriodicWatermarksOperator<T> operator =
new TimestampsAndPeriodicWatermarksOperator<>(cleanedAssigner); return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
.setParallelism(inputParallelism);
}
TimestampsAndPeriodicWatermarksOperator
public class TimestampsAndPeriodicWatermarksOperator<T>
extends AbstractUdfStreamOperator<T, AssignerWithPeriodicWatermarks<T>>
implements OneInputStreamOperator<T, T>, Triggerable { private transient long watermarkInterval;
private transient long currentWatermark; public TimestampsAndPeriodicWatermarksOperator(AssignerWithPeriodicWatermarks<T> assigner) {
super(assigner); //AbstractUdfStreamOperator(F userFunction)
this.chainingStrategy = ChainingStrategy.ALWAYS; //一定是chain
} @Override
public void open() throws Exception {
super.open(); currentWatermark = Long.MIN_VALUE;
watermarkInterval = getExecutionConfig().getAutoWatermarkInterval(); if (watermarkInterval > 0) {
registerTimer(System.currentTimeMillis() + watermarkInterval, this); //注册到定时器
}
} @Override
public void processElement(StreamRecord<T> element) throws Exception {
final long newTimestamp = userFunction.extractTimestamp(element.getValue(), //由element中基于AssignerWithPeriodicWatermarks提取时间戳
element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE); output.collect(element.replace(element.getValue(), newTimestamp)); //更新element的时间戳,再次发出
} @Override
public void trigger(long timestamp) throws Exception { //定时器触发trigger
// register next timer
Watermark newWatermark = userFunction.getCurrentWatermark(); //取得watermark
if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
currentWatermark = newWatermark.getTimestamp();
// emit watermark
output.emitWatermark(newWatermark); //发出watermark
} registerTimer(System.currentTimeMillis() + watermarkInterval, this); //重新注册到定时器
} @Override
public void processWatermark(Watermark mark) throws Exception {
// if we receive a Long.MAX_VALUE watermark we forward it since it is used
// to signal the end of input and to not block watermark progress downstream
if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
currentWatermark = Long.MAX_VALUE;
output.emitWatermark(mark); //forward watermark
}
}
可以看到在processElement会调用AssignerWithPeriodicWatermarks.extractTimestamp提取event time
然后更新StreamRecord的时间
然后在Window Operator中,
@Override
public void processElement(StreamRecord<IN> element) throws Exception {
final Collection<W> elementWindows = windowAssigner.assignWindows(
element.getValue(), element.getTimestamp(), windowAssignerContext);
会在windowAssigner.assignWindows时以element的timestamp作为assign时间
对于watermark的处理,参考,Flink – window operator
Flink - watermark生成的更多相关文章
- [源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
- Flink Program Guide (4) -- 时间戳和Watermark生成(DataStream API编程指导 -- For Java)
时间戳和Watermark生成 本文翻译自Generating Timestamp / Watermarks --------------------------------------------- ...
- flink watermark介绍
转发请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/7610412.html 一 概念 watermark是flink为了处理eventTime窗口计算提 ...
- Flink assignAscendingTimestamps 生成水印的三个重载方法
先简单介绍一下Timestamp 和Watermark 的概念: 1. Timestamp和Watermark都是基于事件的时间字段生成的 2. Timestamp和Watermark是两个不同的东西 ...
- flink WaterMark之TumblingEventWindow
1.WaterMark,翻译成水印或水位线,水印翻译更抽象,水位线翻译接地气. watermark是用于处理乱序事件的,通常用watermark机制结合window来实现. 流处理从事件产生,到流经s ...
- 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark
生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...
- Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)
本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- 老板让阿粉学习 flink 中的 Watermark,现在他出教程了
1 前言 在时间 Time 那一篇中,介绍了三种时间概念 Event.Ingestin 和 Process, 其中还简单介绍了乱序 Event Time 事件和它的解决方案 Watermark 水位线 ...
随机推荐
- 物联网架构成长之路(9)-双机热备Keepalived了解
1. 前言 负载均衡LB,高可用HA,这一小结主要讲双机热备方案保证高可用.这里选择Keepalived作为双机热备方案,下面就对具体的配置进行了解.2. 下载Keepalived wget http ...
- python文件的基础操作
import os print(,'-')) print(os.getcwd()) print(,'-')) print(os.listdir()) print(,'-')) print(os.lis ...
- 【Big Data - ELK】ELK(ElasticSearch, Logstash, Kibana)搭建实时日志分析平台
摘要: 前段时间研究的Log4j+Kafka中,有人建议把Kafka收集到的日志存放于ES(ElasticSearch,一款基于Apache Lucene的开源分布式搜索引擎)中便于查找和分析,在研究 ...
- GC调优在Spark应用中的实践[转]
作者:仲浩 出处:<程序员>电子刊5月B 摘要:Spark立足内存计算,常常需要在内存中存放大量数据,因此也更依赖JVM的垃圾回收机制.与此同时,它也兼容批处理和流式处理,对于程序 ...
- MySQL的binlog日志<转>
binlog 基本认识 MySQL的二进制日志可以说是MySQL最重要的日志了,它记录了所有的DDL和DML(除了数据查询语句)语句,以事件形式记录,还包含语句所执行的消耗的时间,MySQL的二进制日 ...
- opencv_java import org.opencv.highgui.Highgui,类中无imread方法
opencv_java import org.opencv.highgui.Highgui,提示错误 2018年01月19日 14:50:25 小码农的路程 阅读数:358 原因:1.OpenCV ...
- CSS让页面平滑滚动
我们以往实现平滑滚动往往用的是jQuery, 如实现平滑回到顶部,就写如下代码: $('.js_go_to_top').click(function () { $(".js_scroll_a ...
- @Transactional(rollbackFor = Exception.class)
@Transactional(rollbackFor = Exception.class)这个注解只有在出异常时才会回滚,需要回滚时没有异常也要人为制造异常(自定义异常)所以,如果使用了异常捕获,很有 ...
- 简单的SSM框架搭建教程
简单的ssm框架的搭建和配置文件 ssm框架里边的配置: 1.src路径下直接存放数据库和log4j的properties文件 2.src路径下建个config包,分别放置ssm的xml文件 3.修改 ...
- localhost兼容js不能用