在大数据程序流行的今天,许多程序都面临着共同的难题:程序输入数据趋于无限大,抵达时间又不确定。一般的解决方法是采用回调函数(callback-function)来实现的,但这样的解决方案很容易造成“回调地狱(callback hell)”,即所谓的“goto-hell”:程序控制跳来跳去很难跟踪,特别是一些变量如果在回调函数中更改后产生不可预料的结果。数据流(stream)是一种解决问题的有效编程方式。Stream是一个抽象概念,能把程序数据输入过程和其它细节隐蔽起来,通过申明方式把数据处理过程描述出来,使整体程序逻辑更容易理解跟踪。当然,牺牲的是对一些运算细节的控制能力。我们在前面介绍过scalaz-stream,它与akka-stream的主要区别在于:

1、scalaz-stream是pull模式的,而akka-stream是push模式的。pull模式的缺点是接收数据效率问题,因为在这种模式里程序必须不断重复检测(polling)输入端口是否有数据存在。而push模式则会把数据推到输入端口后直接进入程序,但如果数据源头动作太快程序无法及时处理所有推送的数据时就会造成所谓的数据溢出问题,遗失数据。不过akka-stream实现了reactive-stream的back-pressure规范:数据发送方和接收方之间互动提示,使过快的数据产生能按接收方要求慢下来甚至暂时停下来。

2、scalaz-sstream和akka-stream的数据流都是一种申明式的数据处理流程描述,属于一种运算方案,最终都需要某种运算器来对数据流按运算方案进行具体的运算,得出运算结果和产生副作用。scalaz-stream的运算器是自备的函数式程序,特点是能很好的控制线程使用和进行并行运算。akka-stream的运算器是materializer。materializer在actor系统上运行,具备了actor模式程序的优点包括:消息驱动、集群运算、监管策略(SupervisorStrategy)等等。

akka-stream的数据流是由三类基础组件组合而成,不同的组合方式代表不同的数据处理及表达功能。三类组件分别是:

1、Source:数据源。akka-stream属于push模式,所以Source也就是Publisher(数据发布方),Source的形状SourceShape代表只有一个输出端口的形状。Source可以从单值、集合、某种Publisher或另一个数据流产生数据流的元素(stream-element),包括:

  /**
* Helper to create [[Source]] from `Iterable`.
* Example usage: `Source(Seq(1,2,3))`
*
* Starts a new `Source` from the given `Iterable`. This is like starting from an
* Iterator, but every Subscriber directly attached to the Publisher of this
* stream will see an individual flow of elements (always starting from the
* beginning) regardless of when they subscribed.
*/
def apply[T](iterable: immutable.Iterable[T]): Source[T, NotUsed] =
single(iterable).mapConcat(ConstantFun.scalaIdentityFunction).withAttributes(DefaultAttributes.iterableSource) /**
* Create a `Source` with one element.
* Every connected `Sink` of this stream will see an individual stream consisting of one element.
*/
def single[T](element: T): Source[T, NotUsed] =
fromGraph(new GraphStages.SingleSource(element)) /**
* Helper to create [[Source]] from `Iterator`.
* Example usage: `Source.fromIterator(() => Iterator.from(0))`
*
* Start a new `Source` from the given function that produces anIterator.
* The produced stream of elements will continue until the iterator runs empty
* or fails during evaluation of the `next()` method.
* Elements are pulled out of the iterator in accordance with the demand coming
* from the downstream transformation steps.
*/
def fromIterator[T](f: () ⇒ Iterator[T]): Source[T, NotUsed] =
apply(new immutable.Iterable[T] {
override def iterator: Iterator[T] = f()
override def toString: String = "() => Iterator"
}) /**
* Starts a new `Source` from the given `Future`. The stream will consist of
* one element when the `Future` is completed with a successful value, which
* may happen before or after materializing the `Flow`.
* The stream terminates with a failure if the `Future` is completed with a failure.
*/
def fromFuture[T](future: Future[T]): Source[T, NotUsed] =
fromGraph(new FutureSource(future)) /**
* Helper to create [[Source]] from `Publisher`.
*
* Construct a transformation starting with given publisher. The transformation steps
* are executed by a series of [[org.reactivestreams.Processor]] instances
* that mediate the flow of elements downstream and the propagation of
* back-pressure upstream.
*/
def fromPublisher[T](publisher: Publisher[T]): Source[T, NotUsed] =
fromGraph(new PublisherSource(publisher, DefaultAttributes.publisherSource, shape("PublisherSource"))) /**
* A graph with the shape of a source logically is a source, this method makes
* it so also in type.
*/
def fromGraph[T, M](g: Graph[SourceShape[T], M]): Source[T, M] = g match {
case s: Source[T, M] ⇒ s
case s: javadsl.Source[T, M] ⇒ s.asScala
case other ⇒ new Source(
LinearTraversalBuilder.fromBuilder(other.traversalBuilder, other.shape, Keep.right),
other.shape)
}

下面还有几个特殊的Source:

  /**
* A `Source` with no elements, i.e. an empty stream that is completed immediately for every connected `Sink`.
*/
def empty[T]: Source[T, NotUsed] = _empty
private[this] val _empty: Source[Nothing, NotUsed] =
Source.fromGraph(EmptySource) /**
* Create a `Source` that will continually emit the given element.
*/
def repeat[T](element: T): Source[T, NotUsed] = {
val next = Some((element, element))
unfold(element)(_ ⇒ next).withAttributes(DefaultAttributes.repeat)
} /**
* Creates [[Source]] that will continually produce given elements in specified order.
*
* Starts a new 'cycled' `Source` from the given elements. The producer stream of elements
* will continue infinitely by repeating the sequence of elements provided by function parameter.
*/
def cycle[T](f: () ⇒ Iterator[T]): Source[T, NotUsed] = {
val iterator = Iterator.continually { val i = f(); if (i.isEmpty) throw new IllegalArgumentException("empty iterator") else i }.flatten
fromIterator(() ⇒ iterator).withAttributes(DefaultAttributes.cycledSource)
}

2、Sink:数据终端。属于数据元素的使用方,主要作用是消耗数据流中的元素。SinkShape是有一个输入端的数据流形状。Sink消耗流元素的例子有:

  /**
* A `Sink` that will consume the stream and discard the elements.
*/
def ignore: Sink[Any, Future[Done]] = fromGraph(GraphStages.IgnoreSink) /**
* A `Sink` that will invoke the given procedure for each received element. The sink is materialized
* into a [[scala.concurrent.Future]] will be completed with `Success` when reaching the
* normal end of the stream, or completed with `Failure` if there is a failure signaled in
* the stream..
*/
def foreach[T](f: T ⇒ Unit): Sink[T, Future[Done]] =
Flow[T].map(f).toMat(Sink.ignore)(Keep.right).named("foreachSink")

注意,akka-stream实际是在actor上进行运算的。actor的内部状态最终可以形成运算结果。上面的例子可以得出Sink的运算结果是Future[??]类型的。

3、Flow:数据处理节点。对通过输入端口输入数据流的元素进行转变处理(transform)后经过输出端口输出。FlowShape有一个输入端和一个输出端。

在akka-stream里数据流组件一般被称为数据流图(graph)。我们可以用许多数据流图组成更大的stream-graph。

akka-stream最简单的完整(或者闭合)线性数据流(linear-stream)就是直接把一个Source和一个Sink相接。这种方式代表一种对数据流所有元素的直接表现,如:source.runWith(Sink.foreach(println))。我们可以用Source.via来连接Flow,用Source.to连接Sink:

  override def via[T, Mat2](flow: Graph[FlowShape[Out, T], Mat2]): Repr[T] = viaMat(flow)(Keep.left)

  override def viaMat[T, Mat2, Mat3](flow: Graph[FlowShape[Out, T], Mat2])(combine: (Mat, Mat2) ⇒ Mat3): Source[T, Mat3] = {
val toAppend =
if (flow.traversalBuilder eq Flow.identityTraversalBuilder)
LinearTraversalBuilder.empty()
else
flow.traversalBuilder new Source[T, Mat3](
traversalBuilder.append(toAppend, flow.shape, combine),
SourceShape(flow.shape.out))
} /**
* Connect this [[akka.stream.scaladsl.Source]] to a [[akka.stream.scaladsl.Sink]],
* concatenating the processing steps of both.
*/
def to[Mat2](sink: Graph[SinkShape[Out], Mat2]): RunnableGraph[Mat] = toMat(sink)(Keep.left) /**
* Connect this [[akka.stream.scaladsl.Source]] to a [[akka.stream.scaladsl.Sink]],
* concatenating the processing steps of both.
*/
def toMat[Mat2, Mat3](sink: Graph[SinkShape[Out], Mat2])(combine: (Mat, Mat2) ⇒ Mat3): RunnableGraph[Mat3] = {
RunnableGraph(traversalBuilder.append(sink.traversalBuilder, sink.shape, combine))
}

可以发现via和to分别是viaMat和toMat的简写,分别固定了Keep.left。意思是选择左边数据流图的运算结果。我们上面提过akka-stream是在actor系统里处理数据流元素的。在这个过程中同时可以用actor内部状态来产生运算结果。via和to连接了左右两个graph,并且选择了左边graph的运算结果。我们可以用viaMat和toMat来选择右边graph运算结果。这是通过combine: (Mat,Mat2)=>Mat3这个函数实现的。akka-stream提供了一个Keep对象来表达这种选择:

/**
* Convenience functions for often-encountered purposes like keeping only the
* left (first) or only the right (second) of two input values.
*/
object Keep {
private val _left = (l: Any, r: Any) ⇒ l
private val _right = (l: Any, r: Any) ⇒ r
private val _both = (l: Any, r: Any) ⇒ (l, r)
private val _none = (l: Any, r: Any) ⇒ NotUsed def left[L, R]: (L, R) ⇒ L = _left.asInstanceOf[(L, R) ⇒ L]
def right[L, R]: (L, R) ⇒ R = _right.asInstanceOf[(L, R) ⇒ R]
def both[L, R]: (L, R) ⇒ (L, R) = _both.asInstanceOf[(L, R) ⇒ (L, R)]
def none[L, R]: (L, R) ⇒ NotUsed = _none.asInstanceOf[(L, R) ⇒ NotUsed]
}

既然提到运算结果的处理方式,我们就来看看Source,Flow,Sink的类型参数:

Source[+Out, +Mat]       //Out代表元素类型,Mat为运算结果类型
Flow[-In, +Out, +Mat] //In,Out为数据流元素类型,Mat是运算结果类型
Sink[-In, +Mat] //In是数据元素类型,Mat是运算结果类型

Keep对象提供的是对Mat的选择。上面源代码中to,toMat函数的返回结果都是RunnableGraph[Mat3],也就是说只有连接了Sink的数据流才能进行运算。RunnableGraph提供一个run()函数来运算数据流:

/**
* Flow with attached input and output, can be executed.
*/
final case class RunnableGraph[+Mat](override val traversalBuilder: TraversalBuilder) extends Graph[ClosedShape, Mat] {
override def shape = ClosedShape /**
* Transform only the materialized value of this RunnableGraph, leaving all other properties as they were.
*/
def mapMaterializedValue[Mat2](f: Mat ⇒ Mat2): RunnableGraph[Mat2] =
copy(traversalBuilder.transformMat(f.asInstanceOf[Any ⇒ Any])) /**
* Run this flow and return the materialized instance from the flow.
*/
def run()(implicit materializer: Materializer): Mat = materializer.materialize(this)
...

上面shape = ClosedShape代表RunnableGraph的形状是闭合的(ClosedShape),意思是说:一个可运行的graph所有输人输出端口都必须是连接的。

下面我们就用一个最简单的线性数据流来做些详细解释:

import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import akka._
import scala.concurrent._ object SourceDemo extends App {
implicit val sys=ActorSystem("demo")
implicit val mat=ActorMaterializer()
implicit val ec=sys.dispatcher val s1: Source[Int,NotUsed] = Source( to )
val sink: Sink[Any,Future[Done]] = Sink.foreach(println)
val rg1: RunnableGraph[NotUsed] = s1.to(sink)
val rg2: RunnableGraph[Future[Done]] = s1.toMat(sink)(Keep.right)
val res1: NotUsed = rg1.run() Thread.sleep() val res2: Future[Done] = rg2.run()
res2.andThen {
case _ => sys.terminate()
} }

我们把焦点放在特别注明的结果类型上面:Source的运算结果Mat类型是NotUsed,Sink的运算结果Mat类型是Future[Done]。从上面这段代码我们看到用toMat选择返回Sink的运算结果Future[Done]才能捕捉到运算终止节点。下面的另一个例子包括了一些组合动作:

  val seq = Seq[Int](,,)
def toIterator() = seq.iterator
val flow1: Flow[Int,Int,NotUsed] = Flow[Int].map(_ + )
val flow2: Flow[Int,Int,NotUsed] = Flow[Int].map(_ * )
val s2 = Source.fromIterator(toIterator)
val s3 = s1 ++ s2 val s4: Source[Int,NotUsed] = s3.viaMat(flow1)(Keep.right)
val s5: Source[Int,NotUsed] = s3.via(flow1).async.viaMat(flow2)(Keep.right)
val s6: Source[Int,NotUsed] = s4.async.viaMat(flow2)(Keep.right)
(s5.toMat(sink)(Keep.right).run()).andThen {case _ => sys.terminate()}

一般来讲,数据流元素的所有处理过程都合并在一个actor上进行(steps-fusing),这样可以免去actor之间的消息传递,但同时也会限制数据元素的并行处理。aync的作用是指定左边的graph在一个独立的actor上运行。注意:s6=s5。

从上面例子里的组合结果类型我们发现:把一个Flow连接到一个Source上形成了一个新的Source。

实际上我们可以用akka-stream Source提供的方法糖来直接运算数据流,如下:

  s1.runForeach(println)
val fres = s6.runFold()(_ + _)
fres.onSuccess{case a => println(a)}
fres.andThen{case _ => sys.terminate()}

下面是Source中的一些runner:

 /**
* Shortcut for running this `Source` with a fold function.
* The given function is invoked for every received element, giving it its previous
* output (or the given `zero` value) and the element as input.
* The returned [[scala.concurrent.Future]] will be completed with value of the final
* function evaluation when the input stream ends, or completed with `Failure`
* if there is a failure signaled in the stream.
*/
def runFold[U](zero: U)(f: (U, Out) ⇒ U)(implicit materializer: Materializer): Future[U] = runWith(Sink.fold(zero)(f)) /**
* Shortcut for running this `Source` with a foldAsync function.
* The given function is invoked for every received element, giving it its previous
* output (or the given `zero` value) and the element as input.
* The returned [[scala.concurrent.Future]] will be completed with value of the final
* function evaluation when the input stream ends, or completed with `Failure`
* if there is a failure signaled in the stream.
*/
def runFoldAsync[U](zero: U)(f: (U, Out) ⇒ Future[U])(implicit materializer: Materializer): Future[U] = runWith(Sink.foldAsync(zero)(f)) /**
* Shortcut for running this `Source` with a reduce function.
* The given function is invoked for every received element, giving it its previous
* output (from the second element) and the element as input.
* The returned [[scala.concurrent.Future]] will be completed with value of the final
* function evaluation when the input stream ends, or completed with `Failure`
* if there is a failure signaled in the stream.
*
* If the stream is empty (i.e. completes before signalling any elements),
* the reduce stage will fail its downstream with a [[NoSuchElementException]],
* which is semantically in-line with that Scala's standard library collections
* do in such situations.
*/
def runReduce[U >: Out](f: (U, U) ⇒ U)(implicit materializer: Materializer): Future[U] =
runWith(Sink.reduce(f)) /**
* Shortcut for running this `Source` with a foreach procedure. The given procedure is invoked
* for each received element.
* The returned [[scala.concurrent.Future]] will be completed with `Success` when reaching the
* normal end of the stream, or completed with `Failure` if there is a failure signaled in
* the stream.
*/
// FIXME: Out => Unit should stay, right??
def runForeach(f: Out ⇒ Unit)(implicit materializer: Materializer): Future[Done] = runWith(Sink.foreach(f))

它们的功能都是通过runWith实现的:

 /**
* Connect this `Source` to a `Sink` and run it. The returned value is the materialized value
* of the `Sink`, e.g. the `Publisher` of a [[akka.stream.scaladsl.Sink#publisher]].
*/
def runWith[Mat2](sink: Graph[SinkShape[Out], Mat2])(implicit materializer: Materializer): Mat2 = toMat(sink)(Keep.right).run()

实际上是使用了Sink类里的对应方法Sink.???。

下面是本次的示范源代码:

import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import akka._
import scala.concurrent._ object SourceDemo extends App {
implicit val sys=ActorSystem("demo")
implicit val mat=ActorMaterializer()
implicit val ec=sys.dispatcher val s1: Source[Int,NotUsed] = Source( to )
val sink: Sink[Any,Future[Done]] = Sink.foreach(println)
val rg1: RunnableGraph[NotUsed] = s1.to(sink)
val rg2: RunnableGraph[Future[Done]] = s1.toMat(sink)(Keep.right)
val res1: NotUsed = rg1.run() Thread.sleep() val res2: Future[Done] = rg2.run()
res2.andThen {
case _ => //sys.terminate()
} val seq = Seq[Int](,,)
def toIterator() = seq.iterator
val flow1: Flow[Int,Int,NotUsed] = Flow[Int].map(_ + )
val flow2: Flow[Int,Int,NotUsed] = Flow[Int].map(_ * )
val s2 = Source.fromIterator(toIterator)
val s3 = s1 ++ s2 val s4: Source[Int,NotUsed] = s3.viaMat(flow1)(Keep.right)
val s5: Source[Int,NotUsed] = s3.via(flow1).async.viaMat(flow2)(Keep.right)
val s6: Source[Int,NotUsed] = s4.async.viaMat(flow2)(Keep.right)
(s5.toMat(sink)(Keep.right).run()).andThen {case _ => } //sys.terminate()} s1.runForeach(println)
val fres = s6.runFold()(_ + _)
fres.onSuccess{case a => println(a)}
fres.andThen{case _ => sys.terminate()} }

Akka(17): Stream:数据流基础组件-Source,Flow,Sink简介的更多相关文章

  1. Akka(18): Stream:组合数据流,组件-Graph components

    akka-stream的数据流可以由一些组件组合而成.这些组件统称数据流图Graph,它描述了数据流向和处理环节.Source,Flow,Sink是最基础的Graph.用基础Graph又可以组合更复杂 ...

  2. spring cloud 2.x版本 Spring Cloud Stream消息驱动组件基础教程(kafaka篇)

    本文采用Spring cloud本文为2.1.8RELEASE,version=Greenwich.SR3 本文基于前两篇文章eureka-server.eureka-client.eureka-ri ...

  3. App架构师实践指南三之基础组件

    App架构师实践指南三之基础组件 1.基础组件库随着时间的增长,代码量的逐渐积累,新旧项目之间有太多可以服用的代码.下面是整理的公共代码库. 2.关于加密密钥的保护以及网络传输安全是移动应用安全最关键 ...

  4. JMeter各个基础组件简介

    刚从LoadRunner转到JMeter,对JMeter的各种概念比较懵.在这里记录下.欢迎大家关注我的个人微信号:测试杂货铺. JMeter的各个功能都是它的组件来完成或实现的,下面来对JMeter ...

  5. Ext JS 6学习文档-第3章-基础组件

    Ext JS 6学习文档-第3章-基础组件 基础组件 在本章中,你将学习到一些 Ext JS 基础组件的使用.同时我们会结合所学创建一个小项目.这一章我们将学习以下知识点: 熟悉基本的组件 – 按钮, ...

  6. video基础介绍&封装react-video基础组件,ES6

    好几个月没有写博客了,人都赖了,今天抽了一点时间把最近项目react中video整理了一下(感觉这个以后用的活比较多) 1.前三部部分详细归纳了video的基础知识,属性和功能: 2.第四部分是封装了 ...

  7. 国产CPU 申威1621 异数OS基础组件理论性能测试报告

    国产CPU 申威1621 异数OS基础组件理论性能测试报告 文章目录 国产CPU 申威1621 异数OS基础组件理论性能测试报告 前言 测试平台 测试项目 SW1621 异数OS 容器虚拟交换机模拟性 ...

  8. React数据流和组件间的沟通总结

    今天来给大家总结下React的单向数据流与组件间的沟通. 首先,我认为使用React的最大好处在于:功能组件化,遵守前端可维护的原则. 先介绍单向数据流吧. React单向数据流: React是单向数 ...

  9. winform快速开发平台 -> 基础组件之分页控件

    一个项目控件主要由及部分的常用组件,当然本次介绍的是通用分页控件. 处理思想:我们在处理分页过程中主要是针对数据库操作. 一般情况主要是传递一些开始位置,当前页数,和数据总页数以及相关关联的业务逻辑. ...

随机推荐

  1. [leetcode-485-Max Consecutive Ones]

    Given a binary array, find the maximum number of consecutive 1s in this array. Example 1: Input: [1, ...

  2. Redux-Saga学习心得

    # Redux Saga ## 简述- Reducers负责处理action的state更新:- Sagas负责协调那些复杂或异步的操作. ## 安装 npm install --save redux ...

  3. selenium 环境搭建

    使用selenium + python来搭建环境的步骤: 1. 下载 python 的版本,常用到的有 2.7 和 3.6 2. 下载 selenium 的版本,通过命令进行下载. pip insta ...

  4. css3学习系列之移动(一)

    transform功能 放缩 使用sacle方法实现文字或图像的放缩处理,在参数中指定缩放倍率,比如sacle(0.5)表示缩小50%,例子如下: <!DOCTYPE html> < ...

  5. Photoshop制作雪碧图技巧

    雪碧图,就是将网页制作中使用的多个小图片合并成一个图片,使用css技术将这张合成的图片应用在网页不同的地方. 雪碧图可以减少网页加载时的http请求数,优化网页性能. 步骤: a.使用Photosho ...

  6. Universal asynchronous receiver transmitter (UART)

    UART基本介绍: 通用异步收发器UART他的功能非常强大 我们只使用UART的全双工异步通信功能,使用中断接收数据. UART_RX:串行数据输入. UART_TX:串行数据输出. 硬件支持: 连接 ...

  7. Java异常体系简析

    最近在阅读<Java编程思想>的时候看到了书中对异常的描述,结合自己阅读源码经历,谈谈自己对异常的理解.首先记住下面两句话: 除非你能解决这个异常,否则不要捕获它,如果打算记录错误消息,那 ...

  8. 谈谈ES6箭头操作符

    如果你会C#或者Java,你肯定知道lambda表达式,ES6中新增的箭头操作符=>便有异曲同工之妙.它简化了函数的书写.操作符左边为输入的参数,而右边则是进行的操作以及返回的值Inputs=& ...

  9. 快速查询List中指定的数据

    时间:2017/5/15 作者:李国君 题目:快速查询List中指定的数据 背景:当List中保存了大量的数据时,用传统的方法去遍历指定的数据肯定会效率低下,有一个方法就是类似于数据库查询那样,根据索 ...

  10. for(int a:i)在java 编程中的使用

    这种有冒号的for循环叫做foreach循环,foreach语句是java5的新特征之一,在遍历数组.集合方面,foreach为开发人员提供了极大的方便. foreach语句是for语句的特殊简化版本 ...