flink window的early计算
Tumbing Windows:滚动窗口,窗口之间时间点不重叠。它是按照固定的时间,或固定的事件个数划分的,分别可以叫做滚动时间窗口和滚动事件窗口。
Sliding Windows:滑动窗口,窗口之间时间点存在重叠。对于某些应用,它们需要的时间是不间断的,需要平滑的进行窗口聚合。
例如,可以每30s记算一次最近1分钟用户所购买的商品数量的总数,这个就是时间滑动窗口;或者每10个客户点击购买,然后就计算一下最近100个客户购买的商品的总和,这个就是事件滑动窗口。
Session Windows:会话窗口,经过一段设置时间无数据认为窗口完成。
在默认的场景下,所有的窗口都是到达时间语义上的windown end time后触发对整个窗口元素的计算,但是在部分场景的情况下,业务方需要在窗口时间没有结束的情况下也可以获得当前的聚合结果,比如每隔五分钟获取当前小时的sum值,这种情况下,官方提供了对于上述窗口的定制化计算器ContinuousEventTimeTrigger和ContinuousProcessingTimeTrigger
下面是一个使用ContinuousProcessingTimeTrigger的简单例子:
public class ContinueTriggerDemo {
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String hostName = "localhost";
Integer port = Integer.parseInt("");
;
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
// 从指定socket获取输入数据
DataStream<String> text = env.socketTextStream(hostName, port);
text.flatMap(new LineSplitter()) //数据语句分词
.keyBy() // 流按照单词分区
.window(TumblingProcessingTimeWindows.of(Time.seconds()))// 设置一个120s的滚动窗口
.trigger(ContinuousProcessingTimeTrigger.of(Time.seconds()))//窗口每统计一次当前计算结果
.sum()// count求和
.map(new Mapdemo())//输出结果加上时间戳
.print();
env.execute("Java WordCount from SocketTextStream Example");
}
/**
* Implements the string tokenizer that splits sentences into words as a
* user-defined FlatMapFunction. The function takes a line (String) and
* splits it into multiple pairs in the form of "(word,1)" (Tuple2<String,
* Integer>).
*/
public static final class LineSplitter implements
FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > ) {
out.collect(new Tuple2<String, Integer>(token, ));
}
}
}
}
public static final class Mapdemo
implements
MapFunction<Tuple2<String, Integer>, Tuple3<String, String, Integer>> {
@Override
public Tuple3<String, String, Integer> map(Tuple2<String, Integer> value)
throws Exception {
// TODO Auto-generated method stub
DateFormat format2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
String s = format2.format(new Date());
return new Tuple3<String, String, Integer>(value.f0, s, value.f1);
}
}
}
在本地启动端口 :nc -lk 8001 并启动flink程序
输入数据:
aa
aa
bb
观察程序数据结果日志
5> (aa,2018-07-30 16:08:20,2)
5> (bb,2018-07-30 16:08:20,1)
5> (aa,2018-07-30 16:08:40,2)
5> (bb,2018-07-30 16:08:40,1)
5> (aa,2018-07-30 16:09:00,2)
5> (bb,2018-07-30 16:09:00,1)
5> (aa,2018-07-30 16:09:20,2)
5> (bb,2018-07-30 16:09:20,1)
5> (aa,2018-07-30 16:09:40,2)
5> (bb,2018-07-30 16:09:40,1)
在上述输入后继续输入
aa
日志结果统计为
5> (aa,2018-07-30 16:10:00,3)
5> (bb,2018-07-30 16:10:00,1)
根据日志数据可见,flink轻松实现了一个窗口时间长度为120s并每20s向下游发送一次窗口当前聚合结果的功能。
附源码:
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\windowing\triggers\ContinuousProcessingTimeTrigger.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.windowing.triggers; import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.Window; /**
* A {@link Trigger} that continuously fires based on a given time interval as measured by
* the clock of the machine on which the job is running.
*
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class ContinuousProcessingTimeTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L; private final long interval; /** When merging we take the lowest of all fire timestamps as the new fire timestamp. */
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("fire-time", new Min(), LongSerializer.INSTANCE); private ContinuousProcessingTimeTrigger(long interval) {
this.interval = interval;
} @Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc); timestamp = ctx.getCurrentProcessingTime(); if (fireTimestamp.get() == null) {
long start = timestamp - (timestamp % interval);
long nextFireTimestamp = start + interval; ctx.registerProcessingTimeTimer(nextFireTimestamp); fireTimestamp.add(nextFireTimestamp);
return TriggerResult.CONTINUE;
}
return TriggerResult.CONTINUE;
} @Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
} @Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc); if (fireTimestamp.get().equals(time)) {
fireTimestamp.clear();
fireTimestamp.add(time + interval);
ctx.registerProcessingTimeTimer(time + interval);
return TriggerResult.FIRE;
}
return TriggerResult.CONTINUE;
} @Override
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
long timestamp = fireTimestamp.get();
ctx.deleteProcessingTimeTimer(timestamp);
fireTimestamp.clear();
} @Override
public boolean canMerge() {
return true;
} @Override
public void onMerge(W window,
OnMergeContext ctx) {
ctx.mergePartitionedState(stateDesc);
} @VisibleForTesting
public long getInterval() {
return interval;
} @Override
public String toString() {
return "ContinuousProcessingTimeTrigger(" + interval + ")";
} /**
* Creates a trigger that continuously fires based on the given interval.
*
* @param interval The time interval at which to fire.
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
public static <W extends Window> ContinuousProcessingTimeTrigger<W> of(Time interval) {
return new ContinuousProcessingTimeTrigger<>(interval.toMilliseconds());
} private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L; @Override
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\windowing\triggers\ContinuousEventTimeTrigger.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.windowing.triggers; import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.Window; /**
* A {@link Trigger} that continuously fires based on a given time interval. This fires based
* on {@link org.apache.flink.streaming.api.watermark.Watermark Watermarks}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class ContinuousEventTimeTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L; private final long interval; /** When merging we take the lowest of all fire timestamps as the new fire timestamp. */
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("fire-time", new Min(), LongSerializer.INSTANCE); private ContinuousEventTimeTrigger(long interval) {
this.interval = interval;
} @Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception { if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
} ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
if (fireTimestamp.get() == null) {
long start = timestamp - (timestamp % interval);
long nextFireTimestamp = start + interval;
ctx.registerEventTimeTimer(nextFireTimestamp);
fireTimestamp.add(nextFireTimestamp);
} return TriggerResult.CONTINUE;
} @Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception { if (time == window.maxTimestamp()){
return TriggerResult.FIRE;
} ReducingState<Long> fireTimestampState = ctx.getPartitionedState(stateDesc); Long fireTimestamp = fireTimestampState.get(); if (fireTimestamp != null && fireTimestamp == time) {
fireTimestampState.clear();
fireTimestampState.add(time + interval);
ctx.registerEventTimeTimer(time + interval);
return TriggerResult.FIRE;
} return TriggerResult.CONTINUE;
} @Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
} @Override
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
Long timestamp = fireTimestamp.get();
if (timestamp != null) {
ctx.deleteEventTimeTimer(timestamp);
fireTimestamp.clear();
}
} @Override
public boolean canMerge() {
return true;
} @Override
public void onMerge(W window, OnMergeContext ctx) throws Exception {
ctx.mergePartitionedState(stateDesc);
Long nextFireTimestamp = ctx.getPartitionedState(stateDesc).get();
if (nextFireTimestamp != null) {
ctx.registerEventTimeTimer(nextFireTimestamp);
}
} @Override
public String toString() {
return "ContinuousEventTimeTrigger(" + interval + ")";
} @VisibleForTesting
public long getInterval() {
return interval;
} /**
* Creates a trigger that continuously fires based on the given interval.
*
* @param interval The time interval at which to fire.
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
public static <W extends Window> ContinuousEventTimeTrigger<W> of(Time interval) {
return new ContinuousEventTimeTrigger<>(interval.toMilliseconds());
} private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L; @Override
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}
flink window的early计算的更多相关文章
- 一文搞懂Flink Window机制
Windows是处理无线数据流的核心,它将流分割成有限大小的桶(buckets),并在其上执行各种计算. 窗口化的Flink程序的结构通常如下,有分组流(keyed streams)和无分组流(non ...
- Flink – window operator
参考, http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/ http://wuchong.me/blog/201 ...
- Flink window机制
此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 问题 window是解决流计算中的什么问题? 怎么划分window?有哪几种window?window与时间属 ...
- flink window实例分析
window是处理数据的核心.按需选择你需要的窗口类型后,它会将传入的原始数据流切分成多个buckets,所有计算都在window中进行. flink本身提供的实例程序TopSpeedWindowin ...
- 【翻译】Flink window
本文翻译自flink官网:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/window ...
- Apache Flink - Window
Window: 在Streaming中,数据是无限且连续的,我们不可能等所有数据都到才进行处理,我们可以来一个就处理一下,但是有时我们需要做一些聚合类的处理,例如:在过去的1分钟内有多少用户点击了我们 ...
- Flink Window窗口机制
总览 Window 是flink处理无限流的核心,Windows将流拆分为有限大小的"桶",我们可以在其上应用计算. Flink 认为 Batch 是 Streaming 的一个特 ...
- Flink Window&Time 原理
Flink 中可以使用一套 API 完成对有界数据集以及无界数据的统一处理,而无界数据集的处理一般会伴随着对某些固定时间间隔的数据聚合处理.比如:每五分钟统计一次系统活跃用户.每十秒更新热搜榜单等等 ...
- flink Window的Timestamps/Watermarks和allowedLateness的区别
Watermartks是通过additional的时间戳来控制窗口激活的时间,allowedLateness来控制窗口的销毁时间. 注: 因为此特性包括官方文档在1.3-1.5版本均未做改变,所以 ...
随机推荐
- AJAX应用【股票案例、验证码校验】
一.股票案例 我们要做的是股票的案例,它能够无刷新地更新股票的数据.当鼠标移动到具体的股票中,它会显示具体的信息. 我们首先来看一下要做出来的效果: 1.1服务器端分析 首先,从效果图我们可以看见很多 ...
- java 虚拟机内存划分,类加载过程以及对象的初始化
涉及关键词: 虚拟机运行时内存 java内存划分 类加载顺序 类加载时机 类加载步骤 对象初始化顺序 构造代码块顺序 构造方法 顺序 内存区域 java内存图 堆 方法区 虚拟机栈 本地 ...
- 微信小程序实现瀑布流布局
前言 最近在做微信小程序,有一个图片列表页面,想通过瀑布流方式来实现,个人比较喜欢这种效果 先看实现效果图 实现原理及代码 将布局分为两列,我们可以使用flex设置 displex:flex 然后每列 ...
- NiftyNet 数据预处理
NiftyNet项目介绍 使用NiftyNet时,我们需要先将图像数据和标签进行一次简单的处理,得到对应的.csv文件. 对应文件格式为: img.csv image path img_name im ...
- 第62章 EntityFramework支持 - Identity Server 4 中文文档(v1.0.0)
为IdentityServer中的配置和操作数据扩展点提供了基于EntityFramework的实现.EntityFramework的使用允许任何EF支持的数据库与此库一起使用. 这个库的仓库位于这里 ...
- Sqlserver UrlEncode
Sqlserver UrlEncode if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[UrlEncode ...
- 【小o地图Excel插件版】不止能做图表,还能抓58、大众点评网页数据...
小o地图Excel插件版:一款基于Excel软件开发的地图软件,提供基于Excel表格进行地理数据挖掘.地理数据分析.地图绘制.地图图表等功能的工具类软件.具有易用.高效.稳定的特点,能够满足地理数据 ...
- spring整合mybatis接口无法注入问题
在学习Spring完之后简单的了解了MyBatis.然后进行简单的整合,遇到MyBatista接口映射的Bean无法自动注入的问题: 代码异常: 线程“main”org.springframe .be ...
- Linux常见命令(一)
三.常见linux命令: 命令提示符的解释: (1).组成(默认):[root@localhost~]# (2)[root@localhost~] root:系统当前登录账户名称,超级管理员为root ...
- 认证与Shiro安全框架
本文内容均来自官网 1.简介 Apache Shiro是Java的一个安全框架.功能强大,使用简单的Java安全框架,它为开发人员提供一个直观而全面的认证,授权,加密及会话管理的解决方案. 实际上,S ...