Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/event_timestamp_extractors.html
根据官网描述,Flink提供预定义的时间戳提取/水位线发射器。如下:
Flink provides abstractions that allow the programmer to assign their own timestamps and emit their own watermarks.
More specifically, one can do so by implementing one of the AssignerWithPeriodicWatermarks
and AssignerWithPunctuatedWatermarks
interfaces, depending on the use case.
In a nutshell, the first will emit watermarks periodically, while the second does so based on some property of the incoming records, e.g. whenever a special element is encountered in the stream.
AssignerWithPeriodicWatermarks介绍:
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPeriodicWatermarks.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.api.common.ExecutionConfig;
import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPeriodicWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class to generate watermarks in a periodical interval.
* At most every {@code i} milliseconds (configured via
* {@link ExecutionConfig#getAutoWatermarkInterval()}), the system will call the
* {@link #getCurrentWatermark()} method to probe for the next watermark value.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>The system may call the {@link #getCurrentWatermark()} method less often than every
* {@code i} milliseconds, if no new elements arrived since the last call to the
* method.
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> { /**
* Returns the current watermark. This method is periodically called by the
* system to retrieve the current watermark. The method may return {@code null} to
* indicate that no new Watermark is available.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If the current watermark is still
* identical to the previous one, no progress in event time has happened since
* the previous call to this method. If a null value is returned, or the timestamp
* of the returned watermark is smaller than that of the last emitted one, then no
* new watermark will be generated.
*
* <p>The interval in which this method is called and Watermarks are generated
* depends on {@link ExecutionConfig#getAutoWatermarkInterval()}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
* @see ExecutionConfig#getAutoWatermarkInterval()
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark getCurrentWatermark();
}
AssignerWithPunctuatedWatermarks
接口介绍
源码路径 flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPunctuatedWatermarks.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPunctuatedWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class if certain special elements act as markers that signify event time
* progress, and when you want to emit watermarks specifically at certain events.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>For use cases that should periodically emit watermarks based on element timestamps,
* use the {@link AssignerWithPeriodicWatermarks} instead.
*
* <p>The following example illustrates how to use this timestamp extractor and watermark
* generator. It assumes elements carry a timestamp that describes when they were created,
* and that some elements carry a flag, marking them as the end of a sequence such that no
* elements with smaller timestamps can come anymore.
*
* <pre>{@code
* public class WatermarkOnFlagAssigner implements AssignerWithPunctuatedWatermarks<MyElement> {
*
* public long extractTimestamp(MyElement element, long previousElementTimestamp) {
* return element.getSequenceTimestamp();
* }
*
* public Watermark checkAndGetNextWatermark(MyElement lastElement, long extractedTimestamp) {
* return lastElement.isEndOfSequence() ? new Watermark(extractedTimestamp) : null;
* }
* }
* }</pre>
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPunctuatedWatermarks<T> extends TimestampAssigner<T> { /**
* Asks this implementation if it wants to emit a watermark. This method is called right after
* the {@link #extractTimestamp(Object, long)} method.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If a null value is returned, or the timestamp of the returned
* watermark is smaller than that of the last emitted one, then no new watermark will
* be generated.
*
* <p>For an example how to use this method, see the documentation of
* {@link AssignerWithPunctuatedWatermarks this class}.
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark checkAndGetNextWatermark(T lastElement, long extractedTimestamp);
}
两种接口的DEMO:
AssignerWithPeriodicWatermarks
接口DEMO 如:https://www.cnblogs.com/felixzh/p/9687214.html
AssignerWithPunctuatedWatermarks
接口DEMO如下:
package org.apache.flink.streaming.examples.wordcount; import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.util.Collector; import javax.annotation.Nullable;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Properties; public class wcNew {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties props = new Properties();
props.setProperty("bootstrap.servers", "127.0.0.1:9092");
props.setProperty("group.id", "flink-group-debug");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); FlinkKafkaConsumer010<String> consumer =
new FlinkKafkaConsumer010<>(args[], new SimpleStringSchema(), props);
consumer.setStartFromEarliest();
consumer.assignTimestampsAndWatermarks(new MessageWaterEmitter()); DataStream<Tuple3<String, Integer, String>> keyedStream = env
.addSource(consumer)
.flatMap(new MessageSplitter())
.keyBy()
.timeWindow(Time.seconds())
.reduce(new ReduceFunction<Tuple3<String, Integer, String>>() {
@Override
public Tuple3<String, Integer, String> reduce(Tuple3<String, Integer, String> t0, Tuple3<String, Integer, String> t1) throws Exception {
String time0 = t0.getField();
String time1 = t1.getField();
Integer count0 = t0.getField();
Integer count1 = t1.getField();
return new Tuple3<>(t0.getField(), count0 + count1, time0 +"|"+ time1);
}
}); keyedStream.writeAsText(args[], FileSystem.WriteMode.OVERWRITE);
keyedStream.print();
env.execute("Flink-Kafka num count");
} private static class MessageWaterEmitter implements AssignerWithPunctuatedWatermarks<String> { private SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd-hhmmss"); /*
* 先执行该函数,从element中提取时间戳
*@param element record行
*@param previousElementTimestamp 当前的时间
*/
@Override
public long extractTimestamp(String element, long previousElementTimestamp) {
if (element != null && element.contains(",")) {
String[] parts = element.split(",");
if (parts.length == ) {
try {
return sdf.parse(parts[]).getTime();
} catch (ParseException e) {
e.printStackTrace();
}
}
}
return 0L;
} /*
* 再执行该函数,extractedTimestamp的值是extractTimestamp的返回值
*/
@Nullable
@Override
public Watermark checkAndGetNextWatermark(String lastElement, long extractedTimestamp) {
if (lastElement != null && lastElement.contains(",")) {
String[] parts = lastElement.split(",");
if(parts.length==) {
try {
return new Watermark(sdf.parse(parts[]).getTime());
} catch (ParseException e) {
e.printStackTrace();
}
} }
return null;
}
}
private static class MessageSplitter implements FlatMapFunction<String, Tuple3<String, Integer, String>> { @Override
public void flatMap(String s, Collector<Tuple3<String, Integer, String>> collector) throws Exception {
if (s != null && s.contains(",")) {
String[] strings = s.split(",");
if(strings.length==) {
collector.collect(new Tuple3<>(strings[], Integer.parseInt(strings[]), strings[]));
}
}
}
}
}
打包成jar包后,上传到flink所在服务器,在控制台输入
flink run -c org.apache.flink.streaming.examples.wordcount.wcNew flink-kafka.jar topic_test_numcount /tmp/numcount.txt
控制台输入
eee,,-
eee,,-
eee,,-
eee,,-
eee,,-
tail -f numcount.txt 监控numcount.txt输出 当最后一条输入时,可以看到程序输出了前4条的计算结果 (eee,7,20180504-113411|20180504-113415|20180504-113412|20180504-113419)
Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)的更多相关文章
- Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)
本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...
- Flink系列之Time和WaterMark
当数据进入Flink的时候,数据需要带入相应的时间,根据相应的时间进行处理. 让咱们想象一个场景,有一个队列,分别带着指定的时间,那么处理的时候,需要根据相应的时间进行处理,比如:统计最近五分钟的访问 ...
- 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark
生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- flink中对于window和watermark的一些理解
package com.chenxiang.flink.demo; import java.io.IOException; import java.net.ServerSocket; import j ...
- Flink中的window、watermark和ProcessFunction
一.Flink中的window 1,window简述 window 是一种切割无限数据为有限块进行处理的手段.Window 是无限数据流处理的核心,Window 将一个无限的 stream 拆分成有 ...
- PHP预定义接口之 ArrayAccess
最近这段时间回家过年了,博客也没有更新,感觉少学习了好多东西,也错失了好多的学习机会,就像大家在春节抢红包时常说的一句话:一不留神错过了好几亿.废话少说,这篇博客给大家说说关于PHP预定义接口中常用到 ...
- VS2013 预定义的宏
Visual Studio 2013 预定义的宏 https://msdn.microsoft.com/zh-cn/library/b0084kay(v=vs.120).aspx 列出预定义的 ANS ...
- 关于标准C语言的预定义宏【转】
标准C语言预处理要求定义某些对象宏,每个预定义宏的名称一两个下划线字符开头和结尾,这些预定义宏不能被取消定义(#undef)或由编程人员重新定义.下面预定义宏表,被我抄了下来. __LINE__ 当 ...
随机推荐
- WebSocket刨根问底(四)之五子棋大战江湖
有暇,做了个五子棋大战的小游戏送给各位小伙伴! 用到的知识点有: 1.JavaWeb基础知识(懂jsp,servlet足够) 2.JavaScript和jQuery基本用法 3.了解WebSocket ...
- 通过LRU实现通用高效的超时连接探测
编写网络通讯都要面对一个问题,就是要把很久不存活的死连接清除,如果不这样做那死连接最终会占用大量内存影响服务运作!在实现过程中一般都会使用ping,pong原理,通过ping,pong来更新连接的时效 ...
- 道路运输车辆卫星定位系统JT/T808服务实现和压测
在工作上的需要接触道路运输车辆卫星定位系统相关应用,由于自己对网络服务的编写比较感兴趣,所以利用空闲时间实现了JT/T808的一些协议和相关服务(不得不说这种协议的设计在解释的确导致性能上的损耗,特别 ...
- 从0打卡leetcode之day 5 ---两个排序数组的中位数
前言 我靠,才坚持了四天,就差点不想坚持了.不行啊,我得把leetcode上的题给刷完,不然怕是不好进入bat的大门. 题目描述 给定两个大小为 m 和 n 的有序数组 nums1 和 nums2 . ...
- Chapter 5 Blood Type——31
I stood carefully, and I was still fine. He held the door for me, his smile polite but his eyes mock ...
- 【微信小程序云开发】从陌生到熟悉
前言 微信小程序在9月10号正式上线了云开发的功能,弱化后端和运维概念,以前开发一个小程序需要申请一个小程序,准备一个https的域名,开发需要一个前端一个服务端,有了云开发只有申请一个小程序,一个前 ...
- Android事件机制之二:onTouch详解
<Android事件机制之一:事件传递和消费>一文总结了Android中的事件传递和消费机制. 在其中对OntachEvent中的总结中,不是很具体.本文将主要对onTach进行总结. o ...
- JavaFX——简单的日记系统
前言 在学习Swing后,听老师说使用Java写界面还可以使用JavaFX.课后,便去了解.JavaFX是甲骨文公司07年推出的期望应用于桌面开发领域的技术.在了解了这个技术几天后,便使用它完成Jav ...
- xamarin.forms之使用CarouselView插件模仿网易新闻导航
在APP中基本都能见到类似网易.今日头条等上边横向导航条,下边是左右滑动的页面,之前做iOS的时候模仿实现过,https://github.com/ywcui/ViewPagerndicator,在做 ...
- Django 系列博客(十一)
Django 系列博客(十一) 前言 本篇博客介绍使用 ORM 来进行多表的操作,当然重点在查询方面. 创建表 实例: 作者模型:一个作者有姓名和年龄. 作者详细模型:把作者的详情放到详情表,包含生日 ...