Akka（20）： Stream：压力缓冲-Batching backpressure and buffering

akka-stream原则上是一种推式（push-model）的数据流。push-model和pull-model的区别在于它们解决问题倾向性：push模式面向高效的数据流下游（fast-downstream-subscriber），pull model倾向高效的上游（fast-upstream-publisher）。现实中速度同等的上下游并不多见，不匹配的上下游速度最终造成数据丢失。如果下游的subscriber无法及时接收由publisher向下游推送的全部数据，那么无论有多大的缓冲区，最终会造成溢出丢失数据。如果上游的publisher无法及时满足下游subscriber的数据读取需求会加长下游的等待状态造成超时甚至会使遗失下游请求遗失。对于akka-stream这种push模式的数据流，因为超速推送数据会造成数据丢失，所以必须想办法控制publisher产生数据的速度。因为akka-stream已经在上下游环节全部实现了Reactive-Streams-Specification，所以上下游之间可以进行互动，这样就可以在akka-stream里由下游通知上游自身可接收数据的状态来控制上游数据流速，即所谓的压力缓冲backpressure了。akka-stream的backpressure使用了缓冲区buffer来成批预存及补充数据，这样可以提高数据传输效率。另外，如果用async进行数据流的并行运算的话上游就不必理会下游反应，可以把数据推进buffer然后立即继续处理下一个数据元素。所以async运算模式的buffering就不可或缺了。akka-stream可以通过以下几种方式来设定异步运算使用的缓冲大小：

1、在配置文件中设定默认buffer：

akka.stream.materializer.max-input-buffer-size =

2、在ActorMaterializerSetting中宏观层面上设定：

val materializer = ActorMaterializer(

  ActorMaterializerSettings(system)

    .withInputBuffer(

      initialSize = ,

      maxSize = ))

3、通过Attribute属性设定。因为Atrribute保持了层级关系，所以通过Attribute设定的inputbuffer也延续了属性继承：

import Attributes._

val nestedSource =

  Source.single()

    .map(_ + )

    .named("nestedSource") // Wrap, no inputBuffer set

val nestedFlow =

  Flow[Int].filter(_ != )

    .via(Flow[Int].map(_ - ).withAttributes(inputBuffer(, ))) // override

    .named("nestedFlow") // Wrap, no inputBuffer set

val nestedSink =

  nestedFlow.to(Sink.fold()(_ + _)) // wire an atomic sink to the nestedFlow

    .withAttributes(name("nestedSink") and inputBuffer(, )) // override

在上面的示例里nestdSource继承了Materializer全局inputBuffer属性；nestedSink重写了属性；nestedFlow先是继承了nestedSink的设定然后又重写了自己的inputBuffer属性。我们可以用addAttribute来新添加Attribute：

  val flow = Flow[Int].map(_ * ).async.addAttributes(Attributes.inputBuffer(,))

  val (_,fut) = flow.runWith(Source( to ),Sink.foreach(println))

  fut.andThen{case _ => sys.terminate()}

上面定义这些inputBuffer包括了起始值和最大值，主要应用在backpressure。所以，理论上inputBuffer可以设成一个字节（initial=1,max=1），因为有了backpressure就不用担心数据溢出，但这样会影响数据流传输效率。所以akka-stream默认的缓冲区长度为16字节。所以aka-stream的backpressure是batching backpressure。

由于akka-stream是push模式的，我们还可以用buffer来控制包括Source,Flow这些上游环节推送的数据：

  val source = Source( to ).buffer(,OverflowStrategy.dropTail)

  val sum = source.runFold()((acc,i) => i + acc)

  sum.map(println)  //.andThen{case _ => sys.terminate()}

  val flow = Flow[Int].map(_ * ).buffer(,OverflowStrategy.dropNew)

  val (_,fut) = flow.runWith(Source( to ),Sink.fold(){(acc,a) => acc + a})

  fut.map(println).andThen{case _ => sys.terminate()}

上游所设buffer对publisher过快产生的数据可以采用溢出处理策略OverflowStrategy。上面用Attribute添加的inputBuffer默认了OverflowStrategy.backpressure，其它OverflowStrategy选项如下：

object OverflowStrategy {

  /**

   * If the buffer is full when a new element arrives, drops the oldest element from the buffer to make space for

   * the new element.

   */

  def dropHead: OverflowStrategy = DropHead

  /**

   * If the buffer is full when a new element arrives, drops the youngest element from the buffer to make space for

   * the new element.

   */

  def dropTail: OverflowStrategy = DropTail

  /**

   * If the buffer is full when a new element arrives, drops all the buffered elements to make space for the new element.

   */

  def dropBuffer: OverflowStrategy = DropBuffer

  /**

   * If the buffer is full when a new element arrives, drops the new element.

   */

  def dropNew: OverflowStrategy = DropNew

  /**

   * If the buffer is full when a new element is available this strategy backpressures the upstream publisher until

   * space becomes available in the buffer.

   */

  def backpressure: OverflowStrategy = Backpressure

  /**

   * If the buffer is full when a new element is available this strategy completes the stream with failure.

   */

  def fail: OverflowStrategy = Fail

}

当akka-stream需要与外界系统进行数据交换时就无法避免数据流上下游速率不匹配的问题了。如果外界系统不支持Reactive-Stream标准，就会发生数据丢失现象。对此akka-stream提供了具体的解决方法：如果外界系统是在上游过快产生数据可以用conflate函数用Seq这样的集合把数据传到下游。如果下游能及时读取则Seq(Item)中的Item正是上游推送的数据元素，否则Seq(i1,i2,i3...)就代表上游在下游再次读取时间段内产生的数据。因为Seq可以是无限大，所以理论上可以避免数据丢失。下面是这个函数的定义：

 /**

   * Allows a faster upstream to progress independently of a slower subscriber by conflating elements into a summary

   * until the subscriber is ready to accept them. For example a conflate step might average incoming numbers if the

   * upstream publisher is faster.

   *

   * This version of conflate allows to derive a seed from the first element and change the aggregated type to be

   * different than the input type. See [[FlowOps.conflate]] for a simpler version that does not change types.

   *

   * This element only rolls up elements if the upstream is faster, but if the downstream is faster it will not

   * duplicate elements.

   *

   * Adheres to the [[ActorAttributes.SupervisionStrategy]] attribute.

   *

   * '''Emits when''' downstream stops backpressuring and there is a conflated element available

   *

   * '''Backpressures when''' never

   *

   * '''Completes when''' upstream completes

   *

   * '''Cancels when''' downstream cancels

   *

   * @param seed Provides the first state for a conflated value using the first unconsumed element as a start

   * @param aggregate Takes the currently aggregated value and the current pending element to produce a new aggregate

   *

   * See also [[FlowOps.conflate]], [[FlowOps.limit]], [[FlowOps.limitWeighted]] [[FlowOps.batch]] [[FlowOps.batchWeighted]]

   */

  def conflateWithSeed[S](seed: Out ⇒ S)(aggregate: (S, Out) ⇒ S): Repr[S] =

    via(Batch(1L, ConstantFun.zeroLong, seed, aggregate).withAttributes(DefaultAttributes.conflate))

下面是conflateWithSeed函数用例：

import akka.actor._

import akka.stream._

import akka.stream.scaladsl._

import scala.concurrent.duration._

object StreamDemo1 extends App {

  implicit val sys = ActorSystem("streamSys")

  implicit val ec = sys.dispatcher

  implicit val mat = ActorMaterializer(

    ActorMaterializerSettings(sys)

      .withInputBuffer(,)

  )

   case class Tick()

   RunnableGraph.fromGraph(GraphDSL.create() { implicit b =>

    import GraphDSL.Implicits._

    // this is the asynchronous stage in this graph

    val zipper = b.add(ZipWith[Tick, Seq[String], Seq[String]]((tick, count) => count).async)

    // this slows down the pipeline by 3 seconds

    Source.tick(initialDelay = .seconds, interval = .seconds, Tick()) ~> zipper.in0

    // faster producer with all elements passed inside a Seq

    Source.tick(initialDelay = .second, interval = .second, "item")

      .conflateWithSeed(Seq(_)) { (acc,elem) => acc :+ elem } ~> zipper.in1

    zipper.out ~> Sink.foreach(println)

    ClosedShape

  }).run()

}

在上面这个例子里我们用ZipWith其中一个低速的输入端来控制整个管道的速率。这时我们会发现输出端Seq长度代表ZipWith消耗数据的延迟间隔。注意：前面3个输出好像没有延迟，这是akka-stream 预读prefetch造成的。因为我们设定了InputBuffer(Initial=1,max=1)，第一个数据被预读当作及时消耗了。

如果没有实现Reactive-Stream标准的外界系统上游producer速率过慢，有可能造成下游超时，akka-stream提供了expand函数来解决这个问题：

 /**

   * Allows a faster downstream to progress independently of a slower publisher by extrapolating elements from an older

   * element until new element comes from the upstream. For example an expand step might repeat the last element for

   * the subscriber until it receives an update from upstream.

   *

   * This element will never "drop" upstream elements as all elements go through at least one extrapolation step.

   * This means that if the upstream is actually faster than the upstream it will be backpressured by the downstream

   * subscriber.

   *

   * Expand does not support [[akka.stream.Supervision.Restart]] and [[akka.stream.Supervision.Resume]].

   * Exceptions from the `seed` or `extrapolate` functions will complete the stream with failure.

   *

   * '''Emits when''' downstream stops backpressuring

   *

   * '''Backpressures when''' downstream backpressures or iterator runs empty

   *

   * '''Completes when''' upstream completes

   *

   * '''Cancels when''' downstream cancels

   *

   * @param seed Provides the first state for extrapolation using the first unconsumed element

   * @param extrapolate Takes the current extrapolation state to produce an output element and the next extrapolation

   *                    state.

   */

  def expand[U](extrapolate: Out ⇒ Iterator[U]): Repr[U] = via(new Expand(extrapolate))

当上游无法及时发送下游请求的数据时我们可以用expand推送一个固定的数据元素来临时满足下游的要求：

 val lastFlow = Flow[Double]

    .expand(Iterator.continually(_))

Akka（20）： Stream：压力缓冲-Batching backpressure and buffering的更多相关文章

Akka（20）： Stream：异步运算，压力缓冲-Async, batching backpressure and buffering
akka-stream原则上是一种推式(push-model)的数据流.push-model和pull-model的区别在于它们解决问题倾向性:push模式面向高效的数据流下游(fast-downst ...
Java入门 - 语言基础 - 20.Stream和File和IO
原文地址:http://www.work100.net/training/java-stream-file-io.html 更多教程:光束云 - 免费课程 Stream和File和IO 序号文内章节 ...
双缓冲技术（Double Buffering）（1、简介和源代码部分）
这一节实在是有些长,翻译完后统计了一下,快到2w字了.考虑到阅读的方便和网络的速度,打算把这节分为5个部分,第一部分为双缓冲技术的一个简介和所有的代码,如果能够看懂代码,不用看译文也就可以了.第二部 ...
Akka Stream文档翻译：Quick Start Guide: Reactive Tweets
Quick Start Guide: Reactive Tweets 快速入门指南: Reactive Tweets (reactive tweets 大概可以理解为“响应式推文”,在此可以测试下GF ...
Akka（25）： Stream：对接外部系统-Integration
在现实应用中akka-stream往往需要集成其它的外部系统形成完整的应用.这些外部系统可能是akka系列系统或者其它类型的系统.所以,akka-stream必须提供一些函数和方法来实现与各种不同类型 ...
C语言流缓冲 Stream Buffering
From : https://www.gnu.org/software/libc/manual/html_node/Stream-Buffering.html 译者:李秋豪 12.20 流缓冲通常情 ...
C语言流缓冲
**From : https://www.gnu.org/software/libc/manual/html_node/Stream-Buffering.html** 12.20 流缓冲通常情况下, ...
Stream初步认识（一）
Stream初步认识(一)测试简介 Stream 是 Java8 中处理集合的关键抽象概念,它可以指定你希望对集合进行的操作,可以执行非常复杂的查找.过滤和映射数据等操作. 使用Stream AP ...
java8之stream
lambda表达式是stream的基础,初学者建议先学习lambda表达式,http://www.cnblogs.com/andywithu/p/7357069.html 1.初识stream 先来一 ...

随机推荐

详解equals()方法和hashCode()方法
前言 Java的基类Object提供了一些方法,其中equals()方法用于判断两个对象是否相等,hashCode()方法用于计算对象的哈希码.equals()和hashCode()都不是final方 ...
MongoDB数据库基础操作
前面的话为了保存网站的用户数据和业务数据,通常需要一个数据库.MongoDB和Node.js特别般配,因为Mongodb是基于文档的非关系型数据库,文档是按BSON(JSON的轻量化二进制格式)存储 ...
springMVC项目国际化(i18n)实现方法
SpringMVC项目国际化(i18n)实现方法按照作息规律,每周五晚必须是分享知识的时间\(^o^)/~,这周讲点儿啥呢,项目需要逼格,咱们国际化吧(*￣rǒ￣)~,项目中碰到这类需求的童鞋可能并 ...
[Oracle]约束（constraint）
(一)约束的概念在Oracle中,可以通过设置约束来防止无效数据进入表中.Oracle一共有5种约束: 主键约束(primary key) 外键约束(foreign key) 唯一性约束(uniqu ...
Loadrunner常用15种的分析点
性能测试的工具目前用的最多的就是LoadRunner和JMeter,性能测试重点在分析和解决, 下边列出了LR中常见的15种分析点,不知道如何分析性能,来看这里吧! Vusers:提供了生产负载的虚拟 ...
（转）VmWare下安装CentOS7图文安装教程
场景:克服安装Linux的恐惧,想装就装.在一篇博客中看到的,很有借鉴意义欢迎转载,但请保留文章原始出处→_→ 生命壹号:http://www.cnblogs.com/smyhvae/ 文章来源 ...
Object-C 里面的animation动画效果,核心动画
#import "CoreAnimationViewController.h" @interface CoreAnimationViewController ()@property ...
HTML5 开发APP(MUI的一些特性)
先附mui文档地址:http://dev.dcloud.net.cn/mui/ui/ .mui的UI组件比较简单而且在文档中很好找就不过多说了. 1 在app开发中,使用HTML5+的api,必须m ...
内核对象kobject和sysfs（2）——kref分析
内核对象kobject和sysfs(2)--kref分析在介绍ref之前,先贴上kref的结构: struct kref { atomic_t refcount; }; 可以看到,kref只是包含一 ...
javascript对象的创建--相对java 怎样去创建了"类"i以及实例化对象
由于javascript没有java那么多基本类型,同时也没有提供class这个东西,那么我们想实现javascript的对象创建应该怎么办呢,我简单地从w3c提供的课件中提取了一下几种方法: 一.工 ...

Akka（20）： Stream：压力缓冲-Batching backpressure and buffering

Akka（20）： Stream：压力缓冲-Batching backpressure and buffering的更多相关文章

随机推荐

热门专题