https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/event_timestamp_extractors.html

根据官网描述,Flink提供预定义的时间戳提取/水位线发射器。如下:

Flink provides abstractions that allow the programmer to assign their own timestamps and emit their own watermarks.

More specifically, one can do so by implementing one of the AssignerWithPeriodicWatermarks and AssignerWithPunctuatedWatermarks interfaces, depending on the use case.

In a nutshell, the first will emit watermarks periodically, while the second does so based on some property of the incoming records, e.g. whenever a special element is encountered in the stream.

AssignerWithPeriodicWatermarks介绍:

源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPeriodicWatermarks.java

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.api.common.ExecutionConfig;
import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPeriodicWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class to generate watermarks in a periodical interval.
* At most every {@code i} milliseconds (configured via
* {@link ExecutionConfig#getAutoWatermarkInterval()}), the system will call the
* {@link #getCurrentWatermark()} method to probe for the next watermark value.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>The system may call the {@link #getCurrentWatermark()} method less often than every
* {@code i} milliseconds, if no new elements arrived since the last call to the
* method.
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> { /**
* Returns the current watermark. This method is periodically called by the
* system to retrieve the current watermark. The method may return {@code null} to
* indicate that no new Watermark is available.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If the current watermark is still
* identical to the previous one, no progress in event time has happened since
* the previous call to this method. If a null value is returned, or the timestamp
* of the returned watermark is smaller than that of the last emitted one, then no
* new watermark will be generated.
*
* <p>The interval in which this method is called and Watermarks are generated
* depends on {@link ExecutionConfig#getAutoWatermarkInterval()}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
* @see ExecutionConfig#getAutoWatermarkInterval()
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark getCurrentWatermark();
}

AssignerWithPunctuatedWatermarks 接口介绍

源码路径 flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPunctuatedWatermarks.java

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPunctuatedWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class if certain special elements act as markers that signify event time
* progress, and when you want to emit watermarks specifically at certain events.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>For use cases that should periodically emit watermarks based on element timestamps,
* use the {@link AssignerWithPeriodicWatermarks} instead.
*
* <p>The following example illustrates how to use this timestamp extractor and watermark
* generator. It assumes elements carry a timestamp that describes when they were created,
* and that some elements carry a flag, marking them as the end of a sequence such that no
* elements with smaller timestamps can come anymore.
*
* <pre>{@code
* public class WatermarkOnFlagAssigner implements AssignerWithPunctuatedWatermarks<MyElement> {
*
* public long extractTimestamp(MyElement element, long previousElementTimestamp) {
* return element.getSequenceTimestamp();
* }
*
* public Watermark checkAndGetNextWatermark(MyElement lastElement, long extractedTimestamp) {
* return lastElement.isEndOfSequence() ? new Watermark(extractedTimestamp) : null;
* }
* }
* }</pre>
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPunctuatedWatermarks<T> extends TimestampAssigner<T> { /**
* Asks this implementation if it wants to emit a watermark. This method is called right after
* the {@link #extractTimestamp(Object, long)} method.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If a null value is returned, or the timestamp of the returned
* watermark is smaller than that of the last emitted one, then no new watermark will
* be generated.
*
* <p>For an example how to use this method, see the documentation of
* {@link AssignerWithPunctuatedWatermarks this class}.
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark checkAndGetNextWatermark(T lastElement, long extractedTimestamp);
}

两种接口的DEMO:

AssignerWithPeriodicWatermarks 接口DEMO 如:https://www.cnblogs.com/felixzh/p/9687214.html

AssignerWithPunctuatedWatermarks 接口DEMO如下:

package org.apache.flink.streaming.examples.wordcount;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.util.Collector; import javax.annotation.Nullable;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Properties; public class wcNew {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties props = new Properties();
props.setProperty("bootstrap.servers", "127.0.0.1:9092");
props.setProperty("group.id", "flink-group-debug");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); FlinkKafkaConsumer010<String> consumer =
new FlinkKafkaConsumer010<>(args[], new SimpleStringSchema(), props);
consumer.setStartFromEarliest();
consumer.assignTimestampsAndWatermarks(new MessageWaterEmitter()); DataStream<Tuple3<String, Integer, String>> keyedStream = env
.addSource(consumer)
.flatMap(new MessageSplitter())
.keyBy()
.timeWindow(Time.seconds())
.reduce(new ReduceFunction<Tuple3<String, Integer, String>>() {
@Override
public Tuple3<String, Integer, String> reduce(Tuple3<String, Integer, String> t0, Tuple3<String, Integer, String> t1) throws Exception {
String time0 = t0.getField();
String time1 = t1.getField();
Integer count0 = t0.getField();
Integer count1 = t1.getField();
return new Tuple3<>(t0.getField(), count0 + count1, time0 +"|"+ time1);
}
}); keyedStream.writeAsText(args[], FileSystem.WriteMode.OVERWRITE);
keyedStream.print();
env.execute("Flink-Kafka num count");
} private static class MessageWaterEmitter implements AssignerWithPunctuatedWatermarks<String> { private SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd-hhmmss"); /*
* 先执行该函数,从element中提取时间戳
*@param element record行
*@param previousElementTimestamp 当前的时间
*/
@Override
public long extractTimestamp(String element, long previousElementTimestamp) {
if (element != null && element.contains(",")) {
String[] parts = element.split(",");
if (parts.length == ) {
try {
return sdf.parse(parts[]).getTime();
} catch (ParseException e) {
e.printStackTrace();
}
}
}
return 0L;
} /*
* 再执行该函数,extractedTimestamp的值是extractTimestamp的返回值
*/
@Nullable
@Override
public Watermark checkAndGetNextWatermark(String lastElement, long extractedTimestamp) {
if (lastElement != null && lastElement.contains(",")) {
String[] parts = lastElement.split(",");
if(parts.length==) {
try {
return new Watermark(sdf.parse(parts[]).getTime());
} catch (ParseException e) {
e.printStackTrace();
}
} }
return null;
}
}
private static class MessageSplitter implements FlatMapFunction<String, Tuple3<String, Integer, String>> { @Override
public void flatMap(String s, Collector<Tuple3<String, Integer, String>> collector) throws Exception {
if (s != null && s.contains(",")) {
String[] strings = s.split(",");
if(strings.length==) {
collector.collect(new Tuple3<>(strings[], Integer.parseInt(strings[]), strings[]));
}
}
}
}
}

打包成jar包后,上传到flink所在服务器,在控制台输入

flink run -c org.apache.flink.streaming.examples.wordcount.wcNew flink-kafka.jar topic_test_numcount /tmp/numcount.txt

控制台输入

eee,,-
eee,,-
eee,,-
eee,,-
eee,,-

tail -f numcount.txt 监控numcount.txt输出 当最后一条输入时,可以看到程序输出了前4条的计算结果 (eee,7,20180504-113411|20180504-113415|20180504-113412|20180504-113419)

Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)的更多相关文章

  1. Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)

    本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...

  2. Flink系列之Time和WaterMark

    当数据进入Flink的时候,数据需要带入相应的时间,根据相应的时间进行处理. 让咱们想象一个场景,有一个队列,分别带着指定的时间,那么处理的时候,需要根据相应的时间进行处理,比如:统计最近五分钟的访问 ...

  3. 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark

    生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...

  4. Flink的时间类型和watermark机制

    一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...

  5. flink中对于window和watermark的一些理解

    package com.chenxiang.flink.demo; import java.io.IOException; import java.net.ServerSocket; import j ...

  6. Flink中的window、watermark和ProcessFunction

    一.Flink中的window 1,window简述  window 是一种切割无限数据为有限块进行处理的手段.Window 是无限数据流处理的核心,Window 将一个无限的 stream 拆分成有 ...

  7. PHP预定义接口之 ArrayAccess

    最近这段时间回家过年了,博客也没有更新,感觉少学习了好多东西,也错失了好多的学习机会,就像大家在春节抢红包时常说的一句话:一不留神错过了好几亿.废话少说,这篇博客给大家说说关于PHP预定义接口中常用到 ...

  8. VS2013 预定义的宏

    Visual Studio 2013 预定义的宏 https://msdn.microsoft.com/zh-cn/library/b0084kay(v=vs.120).aspx 列出预定义的 ANS ...

  9. 关于标准C语言的预定义宏【转】

    标准C语言预处理要求定义某些对象宏,每个预定义宏的名称一两个下划线字符开头和结尾,这些预定义宏不能被取消定义(#undef)或由编程人员重新定义.下面预定义宏表,被我抄了下来. __LINE__  当 ...

随机推荐

  1. 线段树(区间树)之区间染色和4n推导过程

    前言 线段树(区间树)是什么呢?有了二叉树.二分搜索树,线段树又是干什么的呢?最经典的线段树问题:区间染色:正如它的名字而言,主要解决区间的问题 一.线段树说明 1.什么是线段树? 线段树首先是二叉树 ...

  2. WebAssembly完全入门——了解wasm的前世今身

    前言 接触WebAssembly之后,在google上看了很多资料.感觉对WebAssembly的使用.介绍.意义都说的比较模糊和笼统.感觉看了之后收获没有达到预期,要么是文章中的例子自己去实操不能成 ...

  3. Perl IO:文件锁

    文件锁 当多个进程或多个程序都想要修同一个文件的时候,如果不加控制,多进程或多程序将可能导致文件更新的丢失. 例如进程1和进程2都要写入数据到a.txt中,进程1获取到了文件句柄,进程2也获取到了文件 ...

  4. WPF 窗口大小自适应

    在设置桌面不同分辨率以及较大DPI下,窗口如何显示的问题. 方案一 设置窗口最大值和最小值显示 通过对比当前屏幕的可显示区域,将窗口高宽最大值和最小值,设置为窗口的实际高宽(此例中仅设置高度) 界面设 ...

  5. WPF 文本框设置了阴影效果后,因左右的transform变化引发的拉伸渲染问题

    背景 最近遇到一个动画执行时,文本位置变化的问题.如下图: 如果你仔细看的话,当星星变小时,文本往下降了几个像素. 貌似有点莫名其妙,因为控件之间并不在同一个Panel布局控件中,不存在高度限制变化引 ...

  6. Linux,在不使用U盘的情况下使用wubi.exe程序在Win7上安装ubuntu-14.04.3版系统

    本文介绍如何在不使用U盘的情况下使用wubi.exe程序在Win7上安装ubuntu-14.04.3版系统. 花了一天的时间终于安装上了Ubuntu14.04,过程坎坷,是血泪史,开始报“cannot ...

  7. java数组及数组的插入,删除,冒泡算法

    1.数组的定义 数组为相同类型的若干个数据,在一个数组里面,不能存放多种不同类型的数据,其中每个数据为该数组的一个元素,可以通过下标对改元素进行访问. 1.1 数组的特点 (1)数组被创建后,长度就已 ...

  8. 第一个Web应用

    这篇文章演示如何使用ASP.NET Core创建第一个web api服务. 开始 新建一个Project. 选择模板'ASP.NET Core Web应用程序',并且输入解决方案名称和项目名称. 然后 ...

  9. python爬虫+数据可视化项目(关注、持续更新)

    python爬虫+数据可视化项目(一) 爬取目标:中国天气网(起始url:http://www.weather.com.cn/textFC/hb.shtml#) 爬取内容:全国实时温度最低的十个城市气 ...

  10. 深入理解 JavaScript 执行上下文和执行栈

    前言 如果你是一名 JavaScript 开发者,或者想要成为一名 JavaScript 开发者,那么你必须知道 JavaScript 程序内部的执行机制.执行上下文和执行栈是 JavaScript ...