flink window的early计算
Tumbing Windows:滚动窗口,窗口之间时间点不重叠。它是按照固定的时间,或固定的事件个数划分的,分别可以叫做滚动时间窗口和滚动事件窗口。
Sliding Windows:滑动窗口,窗口之间时间点存在重叠。对于某些应用,它们需要的时间是不间断的,需要平滑的进行窗口聚合。
例如,可以每30s记算一次最近1分钟用户所购买的商品数量的总数,这个就是时间滑动窗口;或者每10个客户点击购买,然后就计算一下最近100个客户购买的商品的总和,这个就是事件滑动窗口。
Session Windows:会话窗口,经过一段设置时间无数据认为窗口完成。
在默认的场景下,所有的窗口都是到达时间语义上的windown end time后触发对整个窗口元素的计算,但是在部分场景的情况下,业务方需要在窗口时间没有结束的情况下也可以获得当前的聚合结果,比如每隔五分钟获取当前小时的sum值,这种情况下,官方提供了对于上述窗口的定制化计算器ContinuousEventTimeTrigger和ContinuousProcessingTimeTrigger
下面是一个使用ContinuousProcessingTimeTrigger的简单例子:
public class ContinueTriggerDemo {
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String hostName = "localhost";
Integer port = Integer.parseInt("");
;
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment();
// 从指定socket获取输入数据
DataStream<String> text = env.socketTextStream(hostName, port);
text.flatMap(new LineSplitter()) //数据语句分词
.keyBy() // 流按照单词分区
.window(TumblingProcessingTimeWindows.of(Time.seconds()))// 设置一个120s的滚动窗口
.trigger(ContinuousProcessingTimeTrigger.of(Time.seconds()))//窗口每统计一次当前计算结果
.sum()// count求和
.map(new Mapdemo())//输出结果加上时间戳
.print();
env.execute("Java WordCount from SocketTextStream Example");
}
/**
* Implements the string tokenizer that splits sentences into words as a
* user-defined FlatMapFunction. The function takes a line (String) and
* splits it into multiple pairs in the form of "(word,1)" (Tuple2<String,
* Integer>).
*/
public static final class LineSplitter implements
FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > ) {
out.collect(new Tuple2<String, Integer>(token, ));
}
}
}
}
public static final class Mapdemo
implements
MapFunction<Tuple2<String, Integer>, Tuple3<String, String, Integer>> {
@Override
public Tuple3<String, String, Integer> map(Tuple2<String, Integer> value)
throws Exception {
// TODO Auto-generated method stub
DateFormat format2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
String s = format2.format(new Date());
return new Tuple3<String, String, Integer>(value.f0, s, value.f1);
}
}
}
在本地启动端口 :nc -lk 8001 并启动flink程序
输入数据:
aa
aa
bb
观察程序数据结果日志
5> (aa,2018-07-30 16:08:20,2)
5> (bb,2018-07-30 16:08:20,1)
5> (aa,2018-07-30 16:08:40,2)
5> (bb,2018-07-30 16:08:40,1)
5> (aa,2018-07-30 16:09:00,2)
5> (bb,2018-07-30 16:09:00,1)
5> (aa,2018-07-30 16:09:20,2)
5> (bb,2018-07-30 16:09:20,1)
5> (aa,2018-07-30 16:09:40,2)
5> (bb,2018-07-30 16:09:40,1)
在上述输入后继续输入
aa
日志结果统计为
5> (aa,2018-07-30 16:10:00,3)
5> (bb,2018-07-30 16:10:00,1)
根据日志数据可见,flink轻松实现了一个窗口时间长度为120s并每20s向下游发送一次窗口当前聚合结果的功能。
附源码:
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\windowing\triggers\ContinuousProcessingTimeTrigger.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.windowing.triggers; import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.Window; /**
* A {@link Trigger} that continuously fires based on a given time interval as measured by
* the clock of the machine on which the job is running.
*
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class ContinuousProcessingTimeTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L; private final long interval; /** When merging we take the lowest of all fire timestamps as the new fire timestamp. */
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("fire-time", new Min(), LongSerializer.INSTANCE); private ContinuousProcessingTimeTrigger(long interval) {
this.interval = interval;
} @Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc); timestamp = ctx.getCurrentProcessingTime(); if (fireTimestamp.get() == null) {
long start = timestamp - (timestamp % interval);
long nextFireTimestamp = start + interval; ctx.registerProcessingTimeTimer(nextFireTimestamp); fireTimestamp.add(nextFireTimestamp);
return TriggerResult.CONTINUE;
}
return TriggerResult.CONTINUE;
} @Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
} @Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc); if (fireTimestamp.get().equals(time)) {
fireTimestamp.clear();
fireTimestamp.add(time + interval);
ctx.registerProcessingTimeTimer(time + interval);
return TriggerResult.FIRE;
}
return TriggerResult.CONTINUE;
} @Override
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
long timestamp = fireTimestamp.get();
ctx.deleteProcessingTimeTimer(timestamp);
fireTimestamp.clear();
} @Override
public boolean canMerge() {
return true;
} @Override
public void onMerge(W window,
OnMergeContext ctx) {
ctx.mergePartitionedState(stateDesc);
} @VisibleForTesting
public long getInterval() {
return interval;
} @Override
public String toString() {
return "ContinuousProcessingTimeTrigger(" + interval + ")";
} /**
* Creates a trigger that continuously fires based on the given interval.
*
* @param interval The time interval at which to fire.
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
public static <W extends Window> ContinuousProcessingTimeTrigger<W> of(Time interval) {
return new ContinuousProcessingTimeTrigger<>(interval.toMilliseconds());
} private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L; @Override
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\windowing\triggers\ContinuousEventTimeTrigger.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.windowing.triggers; import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.Window; /**
* A {@link Trigger} that continuously fires based on a given time interval. This fires based
* on {@link org.apache.flink.streaming.api.watermark.Watermark Watermarks}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class ContinuousEventTimeTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L; private final long interval; /** When merging we take the lowest of all fire timestamps as the new fire timestamp. */
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("fire-time", new Min(), LongSerializer.INSTANCE); private ContinuousEventTimeTrigger(long interval) {
this.interval = interval;
} @Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception { if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
} ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
if (fireTimestamp.get() == null) {
long start = timestamp - (timestamp % interval);
long nextFireTimestamp = start + interval;
ctx.registerEventTimeTimer(nextFireTimestamp);
fireTimestamp.add(nextFireTimestamp);
} return TriggerResult.CONTINUE;
} @Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception { if (time == window.maxTimestamp()){
return TriggerResult.FIRE;
} ReducingState<Long> fireTimestampState = ctx.getPartitionedState(stateDesc); Long fireTimestamp = fireTimestampState.get(); if (fireTimestamp != null && fireTimestamp == time) {
fireTimestampState.clear();
fireTimestampState.add(time + interval);
ctx.registerEventTimeTimer(time + interval);
return TriggerResult.FIRE;
} return TriggerResult.CONTINUE;
} @Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
} @Override
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
Long timestamp = fireTimestamp.get();
if (timestamp != null) {
ctx.deleteEventTimeTimer(timestamp);
fireTimestamp.clear();
}
} @Override
public boolean canMerge() {
return true;
} @Override
public void onMerge(W window, OnMergeContext ctx) throws Exception {
ctx.mergePartitionedState(stateDesc);
Long nextFireTimestamp = ctx.getPartitionedState(stateDesc).get();
if (nextFireTimestamp != null) {
ctx.registerEventTimeTimer(nextFireTimestamp);
}
} @Override
public String toString() {
return "ContinuousEventTimeTrigger(" + interval + ")";
} @VisibleForTesting
public long getInterval() {
return interval;
} /**
* Creates a trigger that continuously fires based on the given interval.
*
* @param interval The time interval at which to fire.
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
public static <W extends Window> ContinuousEventTimeTrigger<W> of(Time interval) {
return new ContinuousEventTimeTrigger<>(interval.toMilliseconds());
} private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L; @Override
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}
flink window的early计算的更多相关文章
- 一文搞懂Flink Window机制
Windows是处理无线数据流的核心,它将流分割成有限大小的桶(buckets),并在其上执行各种计算. 窗口化的Flink程序的结构通常如下,有分组流(keyed streams)和无分组流(non ...
- Flink – window operator
参考, http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/ http://wuchong.me/blog/201 ...
- Flink window机制
此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 问题 window是解决流计算中的什么问题? 怎么划分window?有哪几种window?window与时间属 ...
- flink window实例分析
window是处理数据的核心.按需选择你需要的窗口类型后,它会将传入的原始数据流切分成多个buckets,所有计算都在window中进行. flink本身提供的实例程序TopSpeedWindowin ...
- 【翻译】Flink window
本文翻译自flink官网:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/window ...
- Apache Flink - Window
Window: 在Streaming中,数据是无限且连续的,我们不可能等所有数据都到才进行处理,我们可以来一个就处理一下,但是有时我们需要做一些聚合类的处理,例如:在过去的1分钟内有多少用户点击了我们 ...
- Flink Window窗口机制
总览 Window 是flink处理无限流的核心,Windows将流拆分为有限大小的"桶",我们可以在其上应用计算. Flink 认为 Batch 是 Streaming 的一个特 ...
- Flink Window&Time 原理
Flink 中可以使用一套 API 完成对有界数据集以及无界数据的统一处理,而无界数据集的处理一般会伴随着对某些固定时间间隔的数据聚合处理.比如:每五分钟统计一次系统活跃用户.每十秒更新热搜榜单等等 ...
- flink Window的Timestamps/Watermarks和allowedLateness的区别
Watermartks是通过additional的时间戳来控制窗口激活的时间,allowedLateness来控制窗口的销毁时间. 注: 因为此特性包括官方文档在1.3-1.5版本均未做改变,所以 ...
随机推荐
- C++中的to_string()函数[C++11支持]
C++ -> 字符串库 -> std::basic_string 定义于头文件 std::string to_string(int value); (1) (C++11起) std::st ...
- 在阿里云 ECS 搭建 nginx https nodejs 环境 (一、 nginx)
首先介绍下相关环境.软件的版本 1.阿里云 ECS . ubuntu-14.04.5 LTS 2.nginx 版本 1.9.2 可能会遇到的问题: 一.在 ssh 服务器上的时候,提示 这个时候需要将 ...
- Linux上安装Zookeeper以及一些注意事项
最近打算出一个系列,介绍Dubbo的使用. 分布式应用现在已经越来越广泛,Spring Could也是一个不错的一站式解决方案,不过据我了解国内目前貌似使用阿里Dubbo的公司比较多,一方面这个框架也 ...
- spring boot 统一异常处理
需求源自于任何一个业务的编写总会有各种各样的条件判断,需要时时手动抛出异常,又希望让接口返回友好的错误信息. spring boot提供的帮助是自动将异常重定向到路由为/error的控制器 但是我们又 ...
- Docker安装MySQL并配置my.cnf
1.创建一个临时的mysql,以便复制出my.cnf等数据 $ docker run --restart=always -d -v /opt/data/mysql/:/var/lib/mysql -p ...
- Shell从入门到精通进阶之二:Shell字符串处理之${}
上一章节讲解了为什么用${}引用变量,${}还有一个重要的功能,就是文本处理,单行文本基本上可以满足你所有需求. 2.1 获取字符串长度 # VAR='hello world!' # echo $VA ...
- jumpserver篇--安装(高可用性 mariadb+haproxy)
1. 需求 为了解决目前登陆方式多种多样,防火墙配置复杂,历史操作无记录,用户权限混乱等等 2. Jumpserver测试环境搭建 2.1. 环境 os:CentOS release 6.8 mini ...
- AppBoxFuture(六): 前端组件化开发
前面几篇都是在介绍结构化与非结构化的数据存储,本篇换换口味介绍一下框架是如何实现前端组件化开发的.首先得感谢Vue.ElementUI等优秀的前端开源项目,这些项目帮助作者快速实现了框架的两个前端 ...
- 第54章 身份资源 - Identity Server 4 中文文档(v1.0.0)
此类为身份资源建模. Enabled 指示此资源是否已启用且可以请求.默认为true. Name 标识资源的唯一名称.这是客户端将用于授权请求中的scope参数的值. DisplayName 该值将用 ...
- arcgis10.0的ArcGIS Services Directory显示401,需要身份验证,访问被拒绝,rest/services需要输入用户名和密码
大家好! 这个错误我也不想说什么,主要是应公司开发需求,从自己的arcgis10.2的版本改为arcgis10.0的版本,装完之后遇到一个错误,老是显示访问被拒绝,我也是找了很多的方式,没有在网上找到 ...