『NiFi 学习之路』把握 —— 架构及主要部件
一、概述
通过前面几篇文章的学习,相信你对 NiFi 有了一个基础性的了解。
数据处理和分发系统 是什么概念?
NiFi 系统中数据的传递方式是怎样的?
NiFi 的重要 Processor 有哪些?
Processor 间是以何种方式进行协作的?
上述问题对于阅读并练习了前几章内容的你来说,应该都不难理解。
那么对于更加深层次的问题诸如:各个 Processor 是如何运行的?ExecuteScript 是如何对脚本初始化的?整个系统是如何实现对数据进行存储、分发和处理的?应该更能勾起你的兴趣。
本文,就尝试从架构的简单剖析给您建立起 NiFi 的进一步的理解。
二、核心概念
http://nifi.apache.org/docs/nifi-docs/html/overview.html#the-core-concepts-of-nifi
NiFi’s fundamental design concepts closely relate to the main ideas of Flow Based Programming [fbp]. Here are some of the main NiFi concepts and how they map to FBP:
NiFi Term FBP Term Description
FlowFile
Information Packet
A FlowFile represents each object moving through the system and for each one, NiFi keeps track of a map of key/value pair attribute strings and its associated content of zero or more bytes.
FlowFile Processor
Black Box
Processors actually perform the work. In [eip] terms a processor is doing some combination of data routing, transformation, or mediation between systems. Processors have access to attributes of a given FlowFile and its content stream. Processors can operate on zero or more FlowFiles in a given unit of work and either commit that work or rollback.
Connection
Bounded Buffer
Connections provide the actual linkage between processors. These act as queues and allow various processes to interact at differing rates. These queues can be prioritized dynamically and can have upper bounds on load, which enable back pressure.
Flow Controller
Scheduler
The Flow Controller maintains the knowledge of how processes connect and manages the threads and allocations thereof which all processes use. The Flow Controller acts as the broker facilitating the exchange of FlowFiles between processors.
Process Group
subnet
A Process Group is a specific set of processes and their connections, which can receive data via input ports and send data out via output ports. In this manner, process groups allow creation of entirely new components simply by composition of other components.
This design model, also similar to [seda], provides many beneficial consequences that help NiFi to be a very effective platform for building powerful and scalable dataflows. A few of these benefits include:
Lends well to visual creation and management of directed graphs of processors
Is inherently asynchronous which allows for very high throughput and natural buffering even as processing and flow rates fluctuate
Provides a highly concurrent model without a developer having to worry about the typical complexities of concurrency
Promotes the development of cohesive and loosely coupled components which can then be reused in other contexts and promotes testable units
The resource constrained connections make critical functions such as back-pressure and pressure release very natural and intuitive
Error handling becomes as natural as the happy-path rather than a coarse grained catch-all
The points at which data enters and exits the system as well as how it flows through are well understood and easily tracked
三、NiFi 架构

图片来源:http://nifi.apache.org/docs.html
根据上面的架构图可知,NiFi 有如下主要部件:
(我就直接粘贴官网的介绍了,因为我觉得官网说得已经很详细了~翻译起来反倒更难理解)
WebServer
The purpose of the web server is to host NiFi’s HTTP-based command and control API.Flow Controller
The flow controller is the brains of the operation. It provides threads for extensions to run on, and manages the schedule of when extensions receive resources to execute.Extensions
There are various types of NiFi extensions which are described in other documents. The key point here is that extensions operate and execute within the JVM.FlowFile Repository
The FlowFile Repository is where NiFi keeps track of the state of what it knows about a given FlowFile that is presently active in the flow. The implementation of the repository is pluggable. The default approach is a persistent Write-Ahead Log located on a specified disk partition.Content Repository
The Content Repository is where the actual content bytes of a given FlowFile live. The implementation of the repository is pluggable. The default approach is a fairly simple mechanism, which stores blocks of data in the file system. More than one file system storage location can be specified so as to get different physical partitions engaged to reduce contention on any single volume.Provenance Repository
The Provenance Repository is where all provenance event data is stored. The repository construct is pluggable with the default implementation being to use one or more physical disk volumes. Within each location event data is indexed and searchable.

Starting with the NiFi 1.0 release, a Zero-Master Clustering paradigm is employed. Each node in a NiFi cluster performs the same tasks on the data, but each operates on a different set of data. Apache ZooKeeper elects a single node as the Cluster Coordinator, and failover is handled automatically by ZooKeeper. All cluster nodes report heartbeat and status information to the Cluster Coordinator. The Cluster Coordinator is responsible for disconnecting and connecting nodes. Additionally, every cluster has one Primary Node, also elected by ZooKeeper. As a DataFlow manager, you can interact with the NiFi cluster through the user interface (UI) of any node. Any change you make is replicated to all nodes in the cluster, allowing for multiple entry points.
此文在我的 Github Pages 上同步发布,地址为:『NiFi学习之路』把握——架构及主要部件
『NiFi 学习之路』把握 —— 架构及主要部件的更多相关文章
- 『NiFi 学习之路』简介
『NiFi 学习之路』简介 『NiFi 学习之路』入门 -- 下载.安装与简单使用 『NiFi 学习之路』资源 -- 资料汇总 『NiFi 学习之路』把握 -- 架构及主要组件 『NiFi 学习之路』 ...
- 『NiFi 学习之路』自定义 —— 组件的自定义及使用
一.概述 许多业务仅仅使用官方提供的组件不能够满足性能上的需求,往往要通过高度可定制的组件来完成特定的业务需求. 而 NiFi 提供了自定义组件的这种方式. 二.自定义 Processor 占坑待续 ...
- 『NiFi 学习之路』使用 —— 主要组件的使用
一.概述 大部分 NiFi 使用者都是通过 NiFi 的 Processor 来实现自己的业务的.因此,我也主要就 NiFi 官方提供的 Porcessor 进行介绍. 二.Processor 如果你 ...
- 『NiFi 学习之路』资源 —— 资料汇总
一.概述 由于 NiFi 是一个比较新的开源项目,国内的相关资料少之又少. 加之,大家都知道,国内的那么些个教程,原创都只是停留在初级使用阶段,没有更多深入的介绍. 再者,其余的文章不是东抄抄就是西抄 ...
- 『NiFi 学习之路』入门 —— 下载、安装与简单使用
一.概述 "光说不练假把式." 官网上的介绍多少让人迷迷糊糊的,各种高大上的词语仿佛让 NiFi 离我们越来越远. 实践是最好的老师.那就让我们试用一下 NiFi 吧! 二.安装 ...
- Android开发学习之路--Android系统架构初探
环境搭建好了,最简单的app也运行过了,那么app到底是怎么运行在手机上的,手机又到底怎么能运行这些应用,一堆的电子元器件最后可以运行这么美妙的界面,在此还是需要好好研究研究.这里从芯片及硬件模块-& ...
- 『王霸之路』从0.1到2.0一文看尽TensorFlow奋斗史
0 序篇 2015年11月,Google正式发布了Tensorflow的白皮书并开源TensorFlow 0.1 版本. 2017年02月,Tensorflow正式发布了1.0.0版本,同时也标志 ...
- 『go成长之路』 defer 作用、典型用法以及多个defer调用顺序,附加defer避坑点,拿来吧你
预习内容 defer 的作用有哪些? 多个 defer 的执行顺序是怎样的? defer,return,函数返回值 三者之间的执行顺序 defer的作用 go中的defer是延迟函数,一般是用于释放资 ...
- linux学习之路4 系统目录架构
linux树状文件系统结构 bin(binary) 保存可执行文件 也就是保存所有命令 boot 引导目录 保存所有跟系统有关的引导程序 其中Vmlinux文件最为重要,是系统内核 dev 保存所有的 ...
随机推荐
- DBA面试题及解答
一:SQL tuning 类 1:列举几种表连接方式答:merge join,hash join,nested loop Hash join散列连接是CBO 做大数据集连接时的常用方式,优化器使用两个 ...
- 在PHP项目中,每个类都要有对应的命名空间,为什么?
语法: namespace Admin\Controller; 功能: 命名空间主要用来区分控制器属于哪个模块下,好区分,更有利于项目的维护:
- Laravel5.1 搭建博客 --文章的增删改查
教程源于:Laravel学院 继文件上传后呢,咱来搞一搞文章的事情. 1 更改数据表 我们需要改改数据表的结构 因为涉及到重命名列名 所以咱需要引入一个包:Doctrine: composer req ...
- STL 源代码剖析 算法 stl_algo.h -- inplace_merge
本文为senlie原创.转载请保留此地址:http://blog.csdn.net/zhengsenlie inplace_merge(应用于有序区间) ----------------------- ...
- IOS 7 自定义的UIAlertView不能在iOS7上正常显示
本文转载至 http://blog.csdn.net/hanbing861210/article/details/13614405 众所周知,当伟大的iOS7系统发布后,表扬的一堆.谩骂的也一片,而对 ...
- Socket通信编程实例(SIB和SS'SOB)
客户端: package socket; import java.io.BufferedReader; import java.io.IOException; import java.io.Input ...
- 最受欢迎的五大BUG管理系统
五大最受欢迎的BUG管理系统 Google在中国大*陆遭遇变故做出暂时性的退出大*陆市场,也使很多忠实的用户受到小小的挫折,以本公司为例,原本的BUG都是记录在google的EXCEL在线文档中 ...
- ios 一个正则表达式测试(只可输入中文、字母和数字)
一个正则表达式测试(只可输入中文.字母和数字) 在项目中碰到了正则表达式的运用,正则还是非常强大的,不管什么编程语言,基本上都可以用到.之前在用java时特别是对用户名或密码使用正则非常爽,写 脚本上 ...
- JSP中的内置对象和Struts中的Web资源的详解
JSP中的内置对象有如下几种: request :继承于HttpServletRequest, HttpServletRequest继承ServletRequest, 获得的Request对象的方法: ...
- ionic 上拉加载问题(分页)
问题描述: 1.第一初始化时执行了上拉加载更多. 2.上拉时存在执行多次加载动作. angularjs的ajax不提供同步机制,是为了防止页面长时间等待,很多时候我们又需要这种同步机制交换状态,比如上 ...