Flink应用案例:How Trackunit leverages Flink to process real-time data from industrial IoT devices

Recently there has been significant discussion about edge computing as a major technology trend in 2019. Edge computing brings computing capabilities away from the cloud, and rather close to the field, especially in the Industrial IoT sector (IIoT). In this blog post we describe how Trackunit leverages Apache Flink as the stream processing framework of choice to build data pipelines for fleet management operations in the construction industry.
Trackunit has specialized in the design, development and production of fleet management systems. The company is a world leader in telematics solutions for the construction industry and provides IoT services for a broad portfolio of companies and sectors to optimize the daily operations of its customers. In the following paragraphs, we describe how Trackunit’s data architecture evolved over time to include new features for the company’s data pipeline.
The company’s journey with Flink started in 2016 as part of a new strategy to build its technology powered by distributed, open source data processing technologies for increased scalability and efficient production deployment. The infrastructure was built on AWS, initially using Amazon Kinesis as the messaging queue and Amazon EMR for cluster management, alongside Flink1.2 (which was quickly upgraded to Flink1.3). The following diagram gives an overview of the initial pipeline:

As shown above, in this phase of the architecture the IoT devices send data through telematics to a Kinesis topic that is then passing them on to a single Flink Job for parsing and storage. During this stage, the architecture included an external parsing service that was additionally accessing data from a database asynchronously. The results were then passed back to the single Flink job and then stored in Cassandra.
However, due to location data being important for industrial IoT applications like Trackunit’s, the second iteration of the pipeline includes additional data enrichment. This is achieved using Flink’s Async I/O function that calls two separate external services: one for parsing and a second for enriching the data that is then transferred back to the pipeline as shown in the diagram below.

The third evolution of the pipeline includes separating this single job to multiple ones, each specialized in a specific pipeline task. As illustrated below, this iteration includes a Flink job responsible for parsing the data which is then moved to a Kinesis topic, followed by a second Flink job responsible for data enrichment and a third one storing the enriched data to Cassandra.

By separating the single Flink job to different ones, the team was able to reuse and add functionality to the same pipeline. Additionally, with each Flink job focusing on a single operation, it became easier to debug and fix issues. Finally, Trackunit’s team can re-use different parts of the infrastructure in different applications as required by the business. This proves to be a scalable solution that allows development work to be repeated and shared across use cases. However, with this setup, the team experienced a slowdown in Flink’s throughput that was caused by the external parsing service. As a solution, the team removed the external parsing service and embedded the code to the Flink parsing job for greater efficiency and faster parsing of the data as shown in the diagram below.

To further increase performance and minimize the number of calls to the async enrichment service, the team implemented a cache to enrich the pipeline with location data before writing to a new Kinesis topic pushing the enriched data downstream as illustrated in the diagram below. This addition managed to decrease the Async calls by 33% which was a big achievement for the team.

Trackunit is constantly looking at new upgrades and Flink features that can increase the pipeline's performance even further and make the architecture more scalable and robust. The team is currently using Flink 1.7.1 in testing and production and plans to replace all internal state to Avro to ensure better state migration.
You can find out more about our journey with Apache Flink and some specific DOs and DONTs in my Flink Forward Berlin 2018 talk here.

Lasse Nedergaard is a lead developer and system architect for reactive distributed systems at Trackunit S/A based on Mesos DC/OS, Apache Flink, Apache Akka and Akka streams, Kinesis, Cassandra, and SQL Server 2016 among others.
About Trackunit:
Since 2003, Trackunit has specialized in the design and development of fleet management systems. The company creates both hardware and software solutions within telematics and industrial IoT. Developing
unique solutions to provide suppliers, owners and operators of machines with the most effective telematics solutions. We use case studies and customer feedback to generate valuable insights for developing new products and services.
Trackunit is the leading global supplier of fleet management solutions, operating out of our HQ in Denmark and eight offices worldwide.
About Apache Flink:
Apache Flink is used by developers to analyze and process data streams of very high volume. By adopting Flink and a data streaming architecture, enterprises can get real-time insights from their data in
milliseconds, as well as cover existing historical data processing needs within a single platform.
Flink is developed and supported by a vibrant and growing open source community at the Apache Software Foundation with more than 460 contributors, of which dA engineers are proud participants.
Flink应用案例:How Trackunit leverages Flink to process real-time data from industrial IoT devices的更多相关文章
- Flink 从 0 到 1 学习 —— Flink Data transformation(转换)
toc: true title: Flink 从 0 到 1 学习 -- Flink Data transformation(转换) date: 2018-11-04 tags: Flink 大数据 ...
- Flink 从0到1学习—— Flink 不可以连续 Split(分流)?
前言 今天上午被 Flink 的一个算子困惑了下,具体问题是什么呢? 我有这么个需求:有不同种类型的告警数据流(包含恢复数据),然后我要将这些数据流做一个拆分,拆分后的话,每种告警里面的数据又想将告警 ...
- Flink 从0到1学习 —— Flink 中如何管理配置?
前言 如果你了解 Apache Flink 的话,那么你应该熟悉该如何像 Flink 发送数据或者如何从 Flink 获取数据.但是在某些情况下,我们需要将配置数据发送到 Flink 集群并从中接收一 ...
- Flink 源码解析 —— 深度解析 Flink 是如何管理好内存的?
前言 如今,许多用于分析大型数据集的开源系统都是用 Java 或者是基于 JVM 的编程语言实现的.最着名的例子是 Apache Hadoop,还有较新的框架,如 Apache Spark.Apach ...
- Flink 源码解析 —— 深度解析 Flink 序列化机制
Flink 序列化机制 https://t.zsxq.com/JaQfeMf 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭 ...
- Flink 从 0 到 1 学习 —— Flink 配置文件详解
前面文章我们已经知道 Flink 是什么东西了,安装好 Flink 后,我们再来看下安装路径下的配置文件吧. 安装目录下主要有 flink-conf.yaml 配置.日志的配置文件.zk 配置.Fli ...
- Flink中案例学习--State与CheckPoint理解
1.State概念理解 在Flink中,按照基本类型,对State做了以下两类的划分:Keyed State, Operator State. Keyed State:和Key有关的状态类型,它只能被 ...
- Apache Flink 进阶(六):Flink 作业执行深度解析
本文根据 Apache Flink 系列直播课程整理而成,由 Apache Flink Contributor.网易云音乐实时计算平台研发工程师岳猛分享.主要分享内容为 Flink Job 执行作业的 ...
- Flink 的Window 操作(基于flink 1.3描述)
Window是无限数据流处理的核心,Window将一个无限的stream拆分成有限大小的”buckets”桶,我们可以在这些桶上做计算操作.本文主要聚焦于在Flink中如何进行窗口操作,以及程序员如何 ...
随机推荐
- SQLServer之创建链接服务器
创建链接服务器注意事项 当我们要跨本地数据库,访问另外一个数据库表中的数据时,本地数据库中就必须要创建远程数据库的DBLINK,通过DBLINNK数据库可以像访问本地数据库一样访问远程数据库表中的数据 ...
- 20190415 - iOS11 无法连接到 App Store 的解决办法
问题:更新 iOS 11 后,打开 App Store 提示: 无法连接至 app store 解决: 进入 iOS 系统[设置][iTunes Store 与 App Store],退出当前登录用户 ...
- vue 对列表数组删除和增加
很重要,一定要好好研究 https://cn.vuejs.org/v2/guide/list.html#%E6%9B%BF%E6%8D%A2%E6%95%B0%E7%BB%84
- 4.29 初始mysql
- pycharm远程同步服务器代码,并设置权限
Pycharm开发工具链接至上面创建的虚拟环境链接 权限问题:此时上传还可能遇到权限问题,即使用的用户名没有权限在给定的目录下写入和修改文件. ubuntu权限管理:移动到指定目标上传文件夹的父文件夹 ...
- C#版 - Leetcode49 - 字母异位词分组 - 题解
C#版 - Leetcode49 - 字母异位词分组 - 题解 Leetcode49.Group Anagrams 在线提交: https://leetcode.com/problems/group- ...
- 【反编译系列】二、反编译代码(jeb)
版权声明:本文为HaiyuKing原创文章,转载请注明出处! 概述 一般情况下我们都是使用dex2jar + jd-gui的方式反编译代码,在实际使用过程中,有时候发现反编译出来的代码阅读效果不是很好 ...
- JavaScript夯实基础系列(一):词法作用域
作用域是一组规则,规定了引擎如何通过标识符名称来查询一个变量.作用域模型有两种:词法作用域和动态作用域.词法作用域是在编写时就已经确定的:通过阅读包含变量定义的数行源码就能知道变量的作用域.Jav ...
- Python:黑板课爬虫闯关第二关
第二关依然是非常的简单 地址:http://www.heibanke.com/lesson/crawler_ex01/ 随便输入昵称呢密码,点击提交,显示如下: 这样看来就很简单了,枚举密码循环 po ...
- Linux权限管理(week1_day5)--技术流ken
权限概述 Linux系统一般将文件可存/取访问的身份分为3个类别:owner(拥有者).group(和所有者同组的用户).others(其他人,除了所有者,除了同组的用户以及除了超级管理员),且3种身 ...