Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/event_timestamp_extractors.html
根据官网描述,Flink提供预定义的时间戳提取/水位线发射器。如下:
Flink provides abstractions that allow the programmer to assign their own timestamps and emit their own watermarks.
More specifically, one can do so by implementing one of the AssignerWithPeriodicWatermarks and AssignerWithPunctuatedWatermarks interfaces, depending on the use case.
In a nutshell, the first will emit watermarks periodically, while the second does so based on some property of the incoming records, e.g. whenever a special element is encountered in the stream.
AssignerWithPeriodicWatermarks介绍:
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPeriodicWatermarks.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.api.common.ExecutionConfig;
import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPeriodicWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class to generate watermarks in a periodical interval.
* At most every {@code i} milliseconds (configured via
* {@link ExecutionConfig#getAutoWatermarkInterval()}), the system will call the
* {@link #getCurrentWatermark()} method to probe for the next watermark value.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>The system may call the {@link #getCurrentWatermark()} method less often than every
* {@code i} milliseconds, if no new elements arrived since the last call to the
* method.
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> { /**
* Returns the current watermark. This method is periodically called by the
* system to retrieve the current watermark. The method may return {@code null} to
* indicate that no new Watermark is available.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If the current watermark is still
* identical to the previous one, no progress in event time has happened since
* the previous call to this method. If a null value is returned, or the timestamp
* of the returned watermark is smaller than that of the last emitted one, then no
* new watermark will be generated.
*
* <p>The interval in which this method is called and Watermarks are generated
* depends on {@link ExecutionConfig#getAutoWatermarkInterval()}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
* @see ExecutionConfig#getAutoWatermarkInterval()
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark getCurrentWatermark();
}
AssignerWithPunctuatedWatermarks 接口介绍
源码路径 flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPunctuatedWatermarks.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPunctuatedWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class if certain special elements act as markers that signify event time
* progress, and when you want to emit watermarks specifically at certain events.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>For use cases that should periodically emit watermarks based on element timestamps,
* use the {@link AssignerWithPeriodicWatermarks} instead.
*
* <p>The following example illustrates how to use this timestamp extractor and watermark
* generator. It assumes elements carry a timestamp that describes when they were created,
* and that some elements carry a flag, marking them as the end of a sequence such that no
* elements with smaller timestamps can come anymore.
*
* <pre>{@code
* public class WatermarkOnFlagAssigner implements AssignerWithPunctuatedWatermarks<MyElement> {
*
* public long extractTimestamp(MyElement element, long previousElementTimestamp) {
* return element.getSequenceTimestamp();
* }
*
* public Watermark checkAndGetNextWatermark(MyElement lastElement, long extractedTimestamp) {
* return lastElement.isEndOfSequence() ? new Watermark(extractedTimestamp) : null;
* }
* }
* }</pre>
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPunctuatedWatermarks<T> extends TimestampAssigner<T> { /**
* Asks this implementation if it wants to emit a watermark. This method is called right after
* the {@link #extractTimestamp(Object, long)} method.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If a null value is returned, or the timestamp of the returned
* watermark is smaller than that of the last emitted one, then no new watermark will
* be generated.
*
* <p>For an example how to use this method, see the documentation of
* {@link AssignerWithPunctuatedWatermarks this class}.
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark checkAndGetNextWatermark(T lastElement, long extractedTimestamp);
}
两种接口的DEMO:
AssignerWithPeriodicWatermarks 接口DEMO 如:https://www.cnblogs.com/felixzh/p/9687214.html
AssignerWithPunctuatedWatermarks 接口DEMO如下:
package org.apache.flink.streaming.examples.wordcount; import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.util.Collector; import javax.annotation.Nullable;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Properties; public class wcNew {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties props = new Properties();
props.setProperty("bootstrap.servers", "127.0.0.1:9092");
props.setProperty("group.id", "flink-group-debug");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); FlinkKafkaConsumer010<String> consumer =
new FlinkKafkaConsumer010<>(args[], new SimpleStringSchema(), props);
consumer.setStartFromEarliest();
consumer.assignTimestampsAndWatermarks(new MessageWaterEmitter()); DataStream<Tuple3<String, Integer, String>> keyedStream = env
.addSource(consumer)
.flatMap(new MessageSplitter())
.keyBy()
.timeWindow(Time.seconds())
.reduce(new ReduceFunction<Tuple3<String, Integer, String>>() {
@Override
public Tuple3<String, Integer, String> reduce(Tuple3<String, Integer, String> t0, Tuple3<String, Integer, String> t1) throws Exception {
String time0 = t0.getField();
String time1 = t1.getField();
Integer count0 = t0.getField();
Integer count1 = t1.getField();
return new Tuple3<>(t0.getField(), count0 + count1, time0 +"|"+ time1);
}
}); keyedStream.writeAsText(args[], FileSystem.WriteMode.OVERWRITE);
keyedStream.print();
env.execute("Flink-Kafka num count");
} private static class MessageWaterEmitter implements AssignerWithPunctuatedWatermarks<String> { private SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd-hhmmss"); /*
* 先执行该函数,从element中提取时间戳
*@param element record行
*@param previousElementTimestamp 当前的时间
*/
@Override
public long extractTimestamp(String element, long previousElementTimestamp) {
if (element != null && element.contains(",")) {
String[] parts = element.split(",");
if (parts.length == ) {
try {
return sdf.parse(parts[]).getTime();
} catch (ParseException e) {
e.printStackTrace();
}
}
}
return 0L;
} /*
* 再执行该函数,extractedTimestamp的值是extractTimestamp的返回值
*/
@Nullable
@Override
public Watermark checkAndGetNextWatermark(String lastElement, long extractedTimestamp) {
if (lastElement != null && lastElement.contains(",")) {
String[] parts = lastElement.split(",");
if(parts.length==) {
try {
return new Watermark(sdf.parse(parts[]).getTime());
} catch (ParseException e) {
e.printStackTrace();
}
} }
return null;
}
}
private static class MessageSplitter implements FlatMapFunction<String, Tuple3<String, Integer, String>> { @Override
public void flatMap(String s, Collector<Tuple3<String, Integer, String>> collector) throws Exception {
if (s != null && s.contains(",")) {
String[] strings = s.split(",");
if(strings.length==) {
collector.collect(new Tuple3<>(strings[], Integer.parseInt(strings[]), strings[]));
}
}
}
}
}
打包成jar包后,上传到flink所在服务器,在控制台输入
flink run -c org.apache.flink.streaming.examples.wordcount.wcNew flink-kafka.jar topic_test_numcount /tmp/numcount.txt
控制台输入
eee,,-
eee,,-
eee,,-
eee,,-
eee,,-
tail -f numcount.txt 监控numcount.txt输出 当最后一条输入时,可以看到程序输出了前4条的计算结果 (eee,7,20180504-113411|20180504-113415|20180504-113412|20180504-113419)
Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)的更多相关文章
- Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)
本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...
- Flink系列之Time和WaterMark
当数据进入Flink的时候,数据需要带入相应的时间,根据相应的时间进行处理. 让咱们想象一个场景,有一个队列,分别带着指定的时间,那么处理的时候,需要根据相应的时间进行处理,比如:统计最近五分钟的访问 ...
- 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark
生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- flink中对于window和watermark的一些理解
package com.chenxiang.flink.demo; import java.io.IOException; import java.net.ServerSocket; import j ...
- Flink中的window、watermark和ProcessFunction
一.Flink中的window 1,window简述 window 是一种切割无限数据为有限块进行处理的手段.Window 是无限数据流处理的核心,Window 将一个无限的 stream 拆分成有 ...
- PHP预定义接口之 ArrayAccess
最近这段时间回家过年了,博客也没有更新,感觉少学习了好多东西,也错失了好多的学习机会,就像大家在春节抢红包时常说的一句话:一不留神错过了好几亿.废话少说,这篇博客给大家说说关于PHP预定义接口中常用到 ...
- VS2013 预定义的宏
Visual Studio 2013 预定义的宏 https://msdn.microsoft.com/zh-cn/library/b0084kay(v=vs.120).aspx 列出预定义的 ANS ...
- 关于标准C语言的预定义宏【转】
标准C语言预处理要求定义某些对象宏,每个预定义宏的名称一两个下划线字符开头和结尾,这些预定义宏不能被取消定义(#undef)或由编程人员重新定义.下面预定义宏表,被我抄了下来. __LINE__ 当 ...
随机推荐
- C#版[击败100.00%的提交] - Leetcode 6. Z字形变换 - 题解
版权声明: 本文为博主Bravo Yeung(知乎UserName同名)的原创文章,欲转载请先私信获博主允许,转载时请附上网址 http://blog.csdn.net/lzuacm. C#版 - L ...
- Spring Cloud中的负载均衡策略
在上篇博客(Spring Cloud中负载均衡器概览)中,我们大致的了解了一下Spring Cloud中有哪些负载均衡器,但是对于负载均衡策略我们并没有去详细了解,我们只是知道在BaseLoadBal ...
- centos 7 linux系统安装 mysql5.7.17(glibc版)
前言:经过一天半的折腾,终于把 mysql 5.7.17 版本安装上了 centos 7 系统上,把能参考的博客几乎都看了一遍,终于发现这些细节问题,然而翻了无数的文章,基本上都没有提到这些,所以小生 ...
- leetcode — best-time-to-buy-and-sell-stock-ii
/** * Source : https://oj.leetcode.com/problems/best-time-to-buy-and-sell-stock-ii/ * * * * Say you ...
- [零] Java 语言运行原理 JVM原理浅析 入门了解简介 Java语言组成部分 javap命令使用
Java Virtual Machine 官方介绍 Java虚拟机规范官方文档 https://docs.oracle.com/javase/specs/index.html 其中以java8的为 ...
- 一篇文章彻底搞懂es6 Promise
前言 Promise,用于解决回调地狱带来的问题,将异步操作以同步的操作编程表达出来,避免了层层嵌套的回调函数. 既然是用来解决回调地狱的问题,那首先来看下什么是回调地狱 var sayhello = ...
- Windows 花屏问题
已经有2台电脑 Windows 10 系统出现花屏现象,表现为比较炫的界面出现花屏.文字显示不全.移位.闪烁等,如果点击“设置”.“开始”,Chrome浏览器等:比较平素的界面显示正常,比如资源管理器 ...
- gulp基本设置
var gulp = require('gulp'); var clean = require('gulp-clean'); var concat = require('gulp-concat'); ...
- React组件方法中为什么要绑定this
如果你尝试使用过React进行前端开发,一定见过下面这样的代码: //假想定义一个ToggleButton开关组件 class ToggleButton extends React.Component ...
- golang判断文件是否存在
判断一个文件是否存在是一个相当常见的需求,在golang中也有多种方案实现这一功能. 现在我们介绍其中两种最常用也是最简单的实现,第一种将是跨平台通用的,而第二种则在POSIX平台上通用. 跨平台实现 ...