flink window的early计算
Tumbing Windows:滚动窗口,窗口之间时间点不重叠。它是按照固定的时间,或固定的事件个数划分的,分别可以叫做滚动时间窗口和滚动事件窗口。
Sliding Windows:滑动窗口,窗口之间时间点存在重叠。对于某些应用,它们需要的时间是不间断的,需要平滑的进行窗口聚合。
例如,可以每30s记算一次最近1分钟用户所购买的商品数量的总数,这个就是时间滑动窗口;或者每10个客户点击购买,然后就计算一下最近100个客户购买的商品的总和,这个就是事件滑动窗口。
Session Windows:会话窗口,经过一段设置时间无数据认为窗口完成。
在默认的场景下,所有的窗口都是到达时间语义上的windown end time后触发对整个窗口元素的计算,但是在部分场景的情况下,业务方需要在窗口时间没有结束的情况下也可以获得当前的聚合结果,比如每隔五分钟获取当前小时的sum值,这种情况下,官方提供了对于上述窗口的定制化计算器ContinuousEventTimeTrigger和ContinuousProcessingTimeTrigger
下面是一个使用ContinuousProcessingTimeTrigger的简单例子:
public class ContinueTriggerDemo {
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String hostName = "localhost";
Integer port = Integer.parseInt("");
;
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
// 从指定socket获取输入数据
DataStream<String> text = env.socketTextStream(hostName, port);
text.flatMap(new LineSplitter()) //数据语句分词
.keyBy() // 流按照单词分区
.window(TumblingProcessingTimeWindows.of(Time.seconds()))// 设置一个120s的滚动窗口
.trigger(ContinuousProcessingTimeTrigger.of(Time.seconds()))//窗口每统计一次当前计算结果
.sum()// count求和
.map(new Mapdemo())//输出结果加上时间戳
.print();
env.execute("Java WordCount from SocketTextStream Example");
}
/**
* Implements the string tokenizer that splits sentences into words as a
* user-defined FlatMapFunction. The function takes a line (String) and
* splits it into multiple pairs in the form of "(word,1)" (Tuple2<String,
* Integer>).
*/
public static final class LineSplitter implements
FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > ) {
out.collect(new Tuple2<String, Integer>(token, ));
}
}
}
}
public static final class Mapdemo
implements
MapFunction<Tuple2<String, Integer>, Tuple3<String, String, Integer>> {
@Override
public Tuple3<String, String, Integer> map(Tuple2<String, Integer> value)
throws Exception {
// TODO Auto-generated method stub
DateFormat format2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
String s = format2.format(new Date());
return new Tuple3<String, String, Integer>(value.f0, s, value.f1);
}
}
}
在本地启动端口 :nc -lk 8001 并启动flink程序
输入数据:
aa
aa
bb
观察程序数据结果日志
5> (aa,2018-07-30 16:08:20,2)
5> (bb,2018-07-30 16:08:20,1)
5> (aa,2018-07-30 16:08:40,2)
5> (bb,2018-07-30 16:08:40,1)
5> (aa,2018-07-30 16:09:00,2)
5> (bb,2018-07-30 16:09:00,1)
5> (aa,2018-07-30 16:09:20,2)
5> (bb,2018-07-30 16:09:20,1)
5> (aa,2018-07-30 16:09:40,2)
5> (bb,2018-07-30 16:09:40,1)
在上述输入后继续输入
aa
日志结果统计为
5> (aa,2018-07-30 16:10:00,3)
5> (bb,2018-07-30 16:10:00,1)
根据日志数据可见,flink轻松实现了一个窗口时间长度为120s并每20s向下游发送一次窗口当前聚合结果的功能。
附源码:
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\windowing\triggers\ContinuousProcessingTimeTrigger.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.windowing.triggers; import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.Window; /**
* A {@link Trigger} that continuously fires based on a given time interval as measured by
* the clock of the machine on which the job is running.
*
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class ContinuousProcessingTimeTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L; private final long interval; /** When merging we take the lowest of all fire timestamps as the new fire timestamp. */
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("fire-time", new Min(), LongSerializer.INSTANCE); private ContinuousProcessingTimeTrigger(long interval) {
this.interval = interval;
} @Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc); timestamp = ctx.getCurrentProcessingTime(); if (fireTimestamp.get() == null) {
long start = timestamp - (timestamp % interval);
long nextFireTimestamp = start + interval; ctx.registerProcessingTimeTimer(nextFireTimestamp); fireTimestamp.add(nextFireTimestamp);
return TriggerResult.CONTINUE;
}
return TriggerResult.CONTINUE;
} @Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
} @Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc); if (fireTimestamp.get().equals(time)) {
fireTimestamp.clear();
fireTimestamp.add(time + interval);
ctx.registerProcessingTimeTimer(time + interval);
return TriggerResult.FIRE;
}
return TriggerResult.CONTINUE;
} @Override
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
long timestamp = fireTimestamp.get();
ctx.deleteProcessingTimeTimer(timestamp);
fireTimestamp.clear();
} @Override
public boolean canMerge() {
return true;
} @Override
public void onMerge(W window,
OnMergeContext ctx) {
ctx.mergePartitionedState(stateDesc);
} @VisibleForTesting
public long getInterval() {
return interval;
} @Override
public String toString() {
return "ContinuousProcessingTimeTrigger(" + interval + ")";
} /**
* Creates a trigger that continuously fires based on the given interval.
*
* @param interval The time interval at which to fire.
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
public static <W extends Window> ContinuousProcessingTimeTrigger<W> of(Time interval) {
return new ContinuousProcessingTimeTrigger<>(interval.toMilliseconds());
} private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L; @Override
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\windowing\triggers\ContinuousEventTimeTrigger.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.windowing.triggers; import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.Window; /**
* A {@link Trigger} that continuously fires based on a given time interval. This fires based
* on {@link org.apache.flink.streaming.api.watermark.Watermark Watermarks}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class ContinuousEventTimeTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L; private final long interval; /** When merging we take the lowest of all fire timestamps as the new fire timestamp. */
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("fire-time", new Min(), LongSerializer.INSTANCE); private ContinuousEventTimeTrigger(long interval) {
this.interval = interval;
} @Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception { if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
} ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
if (fireTimestamp.get() == null) {
long start = timestamp - (timestamp % interval);
long nextFireTimestamp = start + interval;
ctx.registerEventTimeTimer(nextFireTimestamp);
fireTimestamp.add(nextFireTimestamp);
} return TriggerResult.CONTINUE;
} @Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception { if (time == window.maxTimestamp()){
return TriggerResult.FIRE;
} ReducingState<Long> fireTimestampState = ctx.getPartitionedState(stateDesc); Long fireTimestamp = fireTimestampState.get(); if (fireTimestamp != null && fireTimestamp == time) {
fireTimestampState.clear();
fireTimestampState.add(time + interval);
ctx.registerEventTimeTimer(time + interval);
return TriggerResult.FIRE;
} return TriggerResult.CONTINUE;
} @Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
} @Override
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
Long timestamp = fireTimestamp.get();
if (timestamp != null) {
ctx.deleteEventTimeTimer(timestamp);
fireTimestamp.clear();
}
} @Override
public boolean canMerge() {
return true;
} @Override
public void onMerge(W window, OnMergeContext ctx) throws Exception {
ctx.mergePartitionedState(stateDesc);
Long nextFireTimestamp = ctx.getPartitionedState(stateDesc).get();
if (nextFireTimestamp != null) {
ctx.registerEventTimeTimer(nextFireTimestamp);
}
} @Override
public String toString() {
return "ContinuousEventTimeTrigger(" + interval + ")";
} @VisibleForTesting
public long getInterval() {
return interval;
} /**
* Creates a trigger that continuously fires based on the given interval.
*
* @param interval The time interval at which to fire.
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
public static <W extends Window> ContinuousEventTimeTrigger<W> of(Time interval) {
return new ContinuousEventTimeTrigger<>(interval.toMilliseconds());
} private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L; @Override
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}
flink window的early计算的更多相关文章
- 一文搞懂Flink Window机制
Windows是处理无线数据流的核心,它将流分割成有限大小的桶(buckets),并在其上执行各种计算. 窗口化的Flink程序的结构通常如下,有分组流(keyed streams)和无分组流(non ...
- Flink – window operator
参考, http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/ http://wuchong.me/blog/201 ...
- Flink window机制
此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 问题 window是解决流计算中的什么问题? 怎么划分window?有哪几种window?window与时间属 ...
- flink window实例分析
window是处理数据的核心.按需选择你需要的窗口类型后,它会将传入的原始数据流切分成多个buckets,所有计算都在window中进行. flink本身提供的实例程序TopSpeedWindowin ...
- 【翻译】Flink window
本文翻译自flink官网:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/window ...
- Apache Flink - Window
Window: 在Streaming中,数据是无限且连续的,我们不可能等所有数据都到才进行处理,我们可以来一个就处理一下,但是有时我们需要做一些聚合类的处理,例如:在过去的1分钟内有多少用户点击了我们 ...
- Flink Window窗口机制
总览 Window 是flink处理无限流的核心,Windows将流拆分为有限大小的"桶",我们可以在其上应用计算. Flink 认为 Batch 是 Streaming 的一个特 ...
- Flink Window&Time 原理
Flink 中可以使用一套 API 完成对有界数据集以及无界数据的统一处理,而无界数据集的处理一般会伴随着对某些固定时间间隔的数据聚合处理.比如:每五分钟统计一次系统活跃用户.每十秒更新热搜榜单等等 ...
- flink Window的Timestamps/Watermarks和allowedLateness的区别
Watermartks是通过additional的时间戳来控制窗口激活的时间,allowedLateness来控制窗口的销毁时间. 注: 因为此特性包括官方文档在1.3-1.5版本均未做改变,所以 ...
随机推荐
- PerformanceCounter蛋痛的设计
在.NET下对进程的性能计数可以使用PerformanceCounter,通过该对象可以对进程的CPU,内存等信息进行统计.对于正常使用来说这个对象还是很方便,但对于同一名称的多个进程进行性能计数那真 ...
- Python进阶:切片的误区与高级用法
2018-12-31 更新声明:切片系列文章本是分三篇写成,现已合并成一篇.合并后,修正了一些严重的错误(如自定义序列切片的部分),还对行文结构与章节衔接做了大量改动.原系列的单篇就不删除了,毕竟也是 ...
- Chapter 5 Blood Type——4
"Does he mean you?" Jessica asked with insulting astonishment in her voice. “他对你有意思吗?”Jess ...
- 【C#加深理解系列】(一)反射
什么是反射 反射是.NET中的重要机制,通过反射,可以在运行时获得程序或程序集中每一个类型(包括类.结构.委托.接口和枚举等)的成员和成员的信息.有了反射,即可对每一个类型了如指掌.另外我还可以直接创 ...
- Disconf源码分析之启动过程分析下(2)
接上文,下面是第二次扫描的XML配置. <bean id="disconfMgrBean2" class="com.baidu.disconf.client.Dis ...
- springboot情操陶冶-web配置(七)
参数校验通常是OpenApi必做的操作,其会对不合法的输入做统一的校验以防止恶意的请求.本文则对参数校验这方面作下简单的分析 spring.factories 读者应该对此文件加以深刻的印象,很多sp ...
- #5 Python面向对象(四)
前言 本节将是Python面向对象的最后一篇博文了,这节将记录类的特殊方法.特殊成员方法.旧类和新类的不同,以及一些其他知识.Go! 一.类的特殊方法 Python有三种特殊方法:实例方法.静态方法. ...
- Docker镜像构建的两种方式(六)--技术流ken
镜像构建介绍 在什么情况下我们需要自己构建镜像那? (1)当我们找不到现有的镜像,比如自己开发的应用程序 (2)需要在镜像中加入特定的功能 docker构建镜像有两种方式:docker commit命 ...
- rpm和yum软件管理(week2_day5)--技术流ken
rpm简介 这是一个数据库管理工具,可以通过读取数据库,判断软件是否已经安装,如果已经安装可以读取出来所有文件的所在位置等,并可以实现删除这些文件. rpm:RPM is Redhat Package ...
- MEF 基础简介 三
MEF导出类的方法和属性 首先来说导出属性,因为这个比较简单,和导出类差不多,先来看看代码,主要看我加注释的地方,MusicBook.cs中的代码如下: using System; using Sys ...