Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/event_timestamp_extractors.html
根据官网描述,Flink提供预定义的时间戳提取/水位线发射器。如下:
Flink provides abstractions that allow the programmer to assign their own timestamps and emit their own watermarks.
More specifically, one can do so by implementing one of the AssignerWithPeriodicWatermarks and AssignerWithPunctuatedWatermarks interfaces, depending on the use case.
In a nutshell, the first will emit watermarks periodically, while the second does so based on some property of the incoming records, e.g. whenever a special element is encountered in the stream.
AssignerWithPeriodicWatermarks介绍:
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPeriodicWatermarks.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.api.common.ExecutionConfig;
import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPeriodicWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class to generate watermarks in a periodical interval.
* At most every {@code i} milliseconds (configured via
* {@link ExecutionConfig#getAutoWatermarkInterval()}), the system will call the
* {@link #getCurrentWatermark()} method to probe for the next watermark value.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>The system may call the {@link #getCurrentWatermark()} method less often than every
* {@code i} milliseconds, if no new elements arrived since the last call to the
* method.
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> { /**
* Returns the current watermark. This method is periodically called by the
* system to retrieve the current watermark. The method may return {@code null} to
* indicate that no new Watermark is available.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If the current watermark is still
* identical to the previous one, no progress in event time has happened since
* the previous call to this method. If a null value is returned, or the timestamp
* of the returned watermark is smaller than that of the last emitted one, then no
* new watermark will be generated.
*
* <p>The interval in which this method is called and Watermarks are generated
* depends on {@link ExecutionConfig#getAutoWatermarkInterval()}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
* @see ExecutionConfig#getAutoWatermarkInterval()
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark getCurrentWatermark();
}
AssignerWithPunctuatedWatermarks 接口介绍
源码路径 flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPunctuatedWatermarks.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPunctuatedWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class if certain special elements act as markers that signify event time
* progress, and when you want to emit watermarks specifically at certain events.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>For use cases that should periodically emit watermarks based on element timestamps,
* use the {@link AssignerWithPeriodicWatermarks} instead.
*
* <p>The following example illustrates how to use this timestamp extractor and watermark
* generator. It assumes elements carry a timestamp that describes when they were created,
* and that some elements carry a flag, marking them as the end of a sequence such that no
* elements with smaller timestamps can come anymore.
*
* <pre>{@code
* public class WatermarkOnFlagAssigner implements AssignerWithPunctuatedWatermarks<MyElement> {
*
* public long extractTimestamp(MyElement element, long previousElementTimestamp) {
* return element.getSequenceTimestamp();
* }
*
* public Watermark checkAndGetNextWatermark(MyElement lastElement, long extractedTimestamp) {
* return lastElement.isEndOfSequence() ? new Watermark(extractedTimestamp) : null;
* }
* }
* }</pre>
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPunctuatedWatermarks<T> extends TimestampAssigner<T> { /**
* Asks this implementation if it wants to emit a watermark. This method is called right after
* the {@link #extractTimestamp(Object, long)} method.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If a null value is returned, or the timestamp of the returned
* watermark is smaller than that of the last emitted one, then no new watermark will
* be generated.
*
* <p>For an example how to use this method, see the documentation of
* {@link AssignerWithPunctuatedWatermarks this class}.
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark checkAndGetNextWatermark(T lastElement, long extractedTimestamp);
}
两种接口的DEMO:
AssignerWithPeriodicWatermarks 接口DEMO 如:https://www.cnblogs.com/felixzh/p/9687214.html
AssignerWithPunctuatedWatermarks 接口DEMO如下:
package org.apache.flink.streaming.examples.wordcount; import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.util.Collector; import javax.annotation.Nullable;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Properties; public class wcNew {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties props = new Properties();
props.setProperty("bootstrap.servers", "127.0.0.1:9092");
props.setProperty("group.id", "flink-group-debug");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); FlinkKafkaConsumer010<String> consumer =
new FlinkKafkaConsumer010<>(args[], new SimpleStringSchema(), props);
consumer.setStartFromEarliest();
consumer.assignTimestampsAndWatermarks(new MessageWaterEmitter()); DataStream<Tuple3<String, Integer, String>> keyedStream = env
.addSource(consumer)
.flatMap(new MessageSplitter())
.keyBy()
.timeWindow(Time.seconds())
.reduce(new ReduceFunction<Tuple3<String, Integer, String>>() {
@Override
public Tuple3<String, Integer, String> reduce(Tuple3<String, Integer, String> t0, Tuple3<String, Integer, String> t1) throws Exception {
String time0 = t0.getField();
String time1 = t1.getField();
Integer count0 = t0.getField();
Integer count1 = t1.getField();
return new Tuple3<>(t0.getField(), count0 + count1, time0 +"|"+ time1);
}
}); keyedStream.writeAsText(args[], FileSystem.WriteMode.OVERWRITE);
keyedStream.print();
env.execute("Flink-Kafka num count");
} private static class MessageWaterEmitter implements AssignerWithPunctuatedWatermarks<String> { private SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd-hhmmss"); /*
* 先执行该函数,从element中提取时间戳
*@param element record行
*@param previousElementTimestamp 当前的时间
*/
@Override
public long extractTimestamp(String element, long previousElementTimestamp) {
if (element != null && element.contains(",")) {
String[] parts = element.split(",");
if (parts.length == ) {
try {
return sdf.parse(parts[]).getTime();
} catch (ParseException e) {
e.printStackTrace();
}
}
}
return 0L;
} /*
* 再执行该函数,extractedTimestamp的值是extractTimestamp的返回值
*/
@Nullable
@Override
public Watermark checkAndGetNextWatermark(String lastElement, long extractedTimestamp) {
if (lastElement != null && lastElement.contains(",")) {
String[] parts = lastElement.split(",");
if(parts.length==) {
try {
return new Watermark(sdf.parse(parts[]).getTime());
} catch (ParseException e) {
e.printStackTrace();
}
} }
return null;
}
}
private static class MessageSplitter implements FlatMapFunction<String, Tuple3<String, Integer, String>> { @Override
public void flatMap(String s, Collector<Tuple3<String, Integer, String>> collector) throws Exception {
if (s != null && s.contains(",")) {
String[] strings = s.split(",");
if(strings.length==) {
collector.collect(new Tuple3<>(strings[], Integer.parseInt(strings[]), strings[]));
}
}
}
}
}
打包成jar包后,上传到flink所在服务器,在控制台输入
flink run -c org.apache.flink.streaming.examples.wordcount.wcNew flink-kafka.jar topic_test_numcount /tmp/numcount.txt
控制台输入
eee,,-
eee,,-
eee,,-
eee,,-
eee,,-
tail -f numcount.txt 监控numcount.txt输出 当最后一条输入时,可以看到程序输出了前4条的计算结果 (eee,7,20180504-113411|20180504-113415|20180504-113412|20180504-113419)
Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)的更多相关文章
- Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)
本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...
- Flink系列之Time和WaterMark
当数据进入Flink的时候,数据需要带入相应的时间,根据相应的时间进行处理. 让咱们想象一个场景,有一个队列,分别带着指定的时间,那么处理的时候,需要根据相应的时间进行处理,比如:统计最近五分钟的访问 ...
- 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark
生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- flink中对于window和watermark的一些理解
package com.chenxiang.flink.demo; import java.io.IOException; import java.net.ServerSocket; import j ...
- Flink中的window、watermark和ProcessFunction
一.Flink中的window 1,window简述 window 是一种切割无限数据为有限块进行处理的手段.Window 是无限数据流处理的核心,Window 将一个无限的 stream 拆分成有 ...
- PHP预定义接口之 ArrayAccess
最近这段时间回家过年了,博客也没有更新,感觉少学习了好多东西,也错失了好多的学习机会,就像大家在春节抢红包时常说的一句话:一不留神错过了好几亿.废话少说,这篇博客给大家说说关于PHP预定义接口中常用到 ...
- VS2013 预定义的宏
Visual Studio 2013 预定义的宏 https://msdn.microsoft.com/zh-cn/library/b0084kay(v=vs.120).aspx 列出预定义的 ANS ...
- 关于标准C语言的预定义宏【转】
标准C语言预处理要求定义某些对象宏,每个预定义宏的名称一两个下划线字符开头和结尾,这些预定义宏不能被取消定义(#undef)或由编程人员重新定义.下面预定义宏表,被我抄了下来. __LINE__ 当 ...
随机推荐
- RabbitMQ学习笔记(三) 发布与订阅
发布与订阅 在我们使用手机发送消息的时候,即可以选择给单个手机号码发送消息,也可以选择多个手机号码,群发消息. 前面学习工作队列的时候,我们使用的场景是一个消息只能被一个消费者程序实例接收并处理,但是 ...
- Lucene 02 - Lucene的入门程序(Java API的简单使用)
目录 1 准备环境 2 准备数据 3 创建工程 3.1 创建Maven Project(打包方式选jar即可) 3.2 配置pom.xml, 导入依赖 4 编写基础代码 4.1 编写图书POJO 4. ...
- Jvm垃圾回收器(算法篇)
在<Jvm垃圾回收器(基础篇)>中我们主要学习了判断对象是否存活还是死亡?两种基础的垃圾回收算法:引用计数法.可达性分析算法.以及Java引用的4种分类:强引用.软引用.弱引用.虚引用.和 ...
- 【WCF学习大全】
我的WCF之旅(1):创建一个简单的WCF程序 我的WCF之旅(2):Endpoint Overview 我的WCF之旅(3):在WCF中实现双向通信(Bi-directional Communica ...
- C++STL模板库序列容器之deque
目录 一丶队列容器deque简介 二丶使用代码演示 一丶队列容器deque简介 deque底层跟vector一样,都是数组维护.不同的是可以操作头部. 二丶使用代码演示 #define _CRT_SE ...
- 杭电ACM2012--素数判定
素数判定 Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)Total Submis ...
- Linux学习笔记之Python3的安装以及创建虚拟环境(CentOS)
安装python3 一.安装需要编译的关联库 yum instal -y zlib zlib-devel (根据自己系统的情况,安装需要的关联库,同样用yum安装即可) yum install ope ...
- [Winfrom] 使用一个启动快捷方式,打开2个不同的窗体并且共用一个缓存空间
之所以有这个功能,是不想再给后台和前台写一套通讯机制的情况下偷懒的办法! 之前发现在主函数里面写方法,第二次启动程序打开新窗体或是显示隐藏窗体!最后却发现在主函数里面打开的新窗体和原启动的程序并不是共 ...
- Linux设置Swap虚拟内存方法
linux可以文件或者分区来当作虚拟内存. 首先查看当前的内存和swap 空间大小(默认单位为k, -m 单位为M): free -m 查看swap信息,包括文件和分区的详细信息 swapon -s或 ...
- Java 重建二叉树 根据前序中序重建二叉树
题目:输入某二叉树的前序遍历和中序遍历的结果,请重建出该二叉树.假设输入的前序遍历和中序遍历的结果中都不含重复的数字.例如输入前序遍历序列{1,2,4,7,3,5,6,8}和中序遍历序列{4,7,2, ...