FunDA(11)- 数据库操作的并行运算:Parallel data processing
FunDA最重要的设计目标之一就是能够实现数据库操作的并行运算。我们先重温一下fs2是如何实现并行运算的。我们用interleave、merge、either这几种方式来同时处理两个Stream里的元素。interleave保留了固定的交叉排列顺序,而merge和either则会产生不特定顺序,这个现象可以从下面的例子里看到:
implicit val strategy = Strategy.fromFixedDaemonPool() implicit val scheduler = Scheduler.fromFixedDaemonPool() //当前元素跟踪显示
def log[A](pre: String): Pipe[Task,A,A] = _.evalMap { row =>
Task.delay {println(s"${pre}>${row}");row}
}
def randomDelay[A](max: FiniteDuration): Pipe[Task,A,A] = _.evalMap { a => {
val delay: Task[Int] = Task.delay {scala.util.Random.nextInt(max.toMillis.toInt)}
delay.flatMap {d => Task.now(a).schedule(d.millis)}
}
} val s1: Stream[Task,Int] = Stream(,,,,).through(randomDelay(.millis)) val s2 = Stream(,,,,,).through(randomDelay(.millis)) val s3: Stream[Task,String] = Stream("a","b","c").through(randomDelay(.millis)) (s1 interleave s2).through(log("")).run.unsafeRun //> >1
//| >11
//| >2
//| >22
//| >3
//| >33
//| >4
//| >44
//| >5
//| >55 (s1 merge s2).through(log("")).run.unsafeRun //> >11
//| >1
//| >22
//| >2
//| >33
//| >44
//| >3
//| >55
//| >4
//| >5
//| >66
(s1 either s3).through(log("")).run.unsafeRun //> >Left(1)
//| >Left(2)
//| >Right(a)
//| >Right(b)
//| >Left(3)
//| >Left(4)
//| >Left(5)
//| >Right(c)
从上面的例子我们可以看到merge产生的不规则顺序。fs2的nondeterministic算法可以保证两个队列元素处理顺序的合理分配最大化。如果我们需要对两个以上数据流进行并行处理的话,fs2提供了join(mergeN)函数:
def join[F[_],O](maxOpen: Int)(outer: Stream[F,Stream[F,O]])(implicit F: Async[F]): Stream[F,O] = {...}
从这个函数的款式我们看到它的入参数outer是个Stream[F,Stream[F,O]]类型,是个内外两层的流。现实场景如外层是多个数据库连接(connections),内层是多个客户端(clients)。在FunDA的功能描述里外层是多个数据源(sources),内层是多个读取函数(reader),又或者外层是多个数据行(元素),内层是数据处理函数。我们先看看如何实现多个数据源的并行产生:
val ss: Stream[Task,Stream[Task,Int]] = Stream(s1,s2,s1,s2)
//> ss : fs2.Stream[fs2.Task,fs2.Stream[fs2.Task,Int]] = Segment(Emit(Chunk(Seg
从ss的类型款式来看,我们可以直接用Stream构建器来生成这个Stream[Task,Stream[Task,A]]类型。在前面我们已经掌握了用Slick来产生Stream[Task,FDAROW]的方法,例如:
val albumStream1 = streamLoader.fda_typedStream(albumsInfo.result)(db)(.minutes, , )()()
albumStream1是个Reactive-Stream数据源。这样我们可以在FunDA里增加一个并行Source构建函数:
def fda_par_load(sources: FDAPipeLine[FDAROW]*)(maxOpen: Int) = {
concurrent.join(maxOpen)(Stream(sources: _*))
}
maxOpen代表最多可以同时运行的运算数,最好取小于机器内核数的一个数。用这个函数来并行构建数据源:
package com.bayakala.funda.fdapars.examples
import slick.driver.H2Driver.api._
import com.bayakala.funda.samples._
import com.bayakala.funda.fdarows.FDAROW
import com.bayakala.funda.fdasources.FDADataStream._
import scala.concurrent.duration._
import com.bayakala.funda.fdapipes._
import FDAValves._
import com.bayakala.funda.fdapars.FDAPars._
object Example1 extends App {
val albums = SlickModels.albums
val companies = SlickModels.companies //数据源query
val albumsInfo = for {
(a,c) <- albums join companies on (_.company === _.id)
} yield (a.title,a.artist,a.year,c.name) //query结果强类型(用户提供)
case class Album(title: String, artist: String, year: Int, publisher: String) extends FDAROW
//强类型转换函数(用户提供)
def toTypedRow(row: (String, String, Option[Int], String)): Album =
Album(row._1, row._2, row._3.getOrElse(), row._4) val db = Database.forConfig("h2db") val streamLoader = FDAStreamLoader(slick.driver.H2Driver, toTypedRow _)
val albumStream1 = streamLoader.fda_typedStream(albumsInfo.result)(db)(.minutes, , )()()
val albumStream2 = streamLoader.fda_typedStream(albumsInfo.result)(db)(.minutes, , )()()
val albumStream3 = streamLoader.fda_typedStream(albumsInfo.result)(db)(.minutes, , )()() def printAlbums: FDATask[FDAROW] = row => {
row match {
case album: Album =>
println("____________________")
println(s"品名:${album.title}")
println(s"演唱:${album.artist}")
println(s"年份:${album.year}")
println(s"发行:${album.publisher}") fda_skip
// fda_next(album)
case r@_ => fda_next(r)
}
} fda_par_load(albumStream1,albumStream1,albumStream1)().appendTask(printAlbums).startRun
startRun后显示结果:
*** (c.z.hikari.HikariDataSource) HikariCP pool h2db is starting.
*** (s.jdbc.JdbcBackend.statement) Preparing statement: select x2."TITLE", x2."ARTIST", x2."YEAR", x3."NAME" from "ALBUMS" x2, "COMPANY" x3 where x2."COMPANY" = x3."ID"
*** (s.jdbc.JdbcBackend.statement) Preparing statement: select x2."TITLE", x2."ARTIST", x2."YEAR", x3."NAME" from "ALBUMS" x2, "COMPANY" x3 where x2."COMPANY" = x3."ID"
*** (s.jdbc.JdbcBackend.statement) Preparing statement: select x2."TITLE", x2."ARTIST", x2."YEAR", x3."NAME" from "ALBUMS" x2, "COMPANY" x3 where x2."COMPANY" = x3."ID"
____________________
品名:Keyboard Cat's Greatest Hits
演唱:Keyboard Cat
年份:
发行:Sony Music Inc
____________________
品名:Keyboard Cat's Greatest Hits
演唱:Keyboard Cat
年份:
发行:Sony Music Inc
____________________
品名:Keyboard Cat's Greatest Hits
演唱:Keyboard Cat
年份:
发行:Sony Music Inc
____________________
品名:Spice
演唱:Spice Girls
年份:
发行:Columbia Records
____________________
品名:Spice
演唱:Spice Girls
年份:
发行:Columbia Records
____________________
品名:Spice
演唱:Spice Girls
年份:
发行:Columbia Records
____________________
品名:Whenever You Need Somebody
演唱:Rick Astley
年份:
发行:Sony Music Inc
____________________
品名:Whenever You Need Somebody
演唱:Rick Astley
年份:
发行:Sony Music Inc
____________________
品名:Whenever You Need Somebody
演唱:Rick Astley
年份:
发行:Sony Music Inc
____________________
品名:The Triumph of Steel
演唱:Manowar
年份:
发行:The K-Pops Singers
____________________
品名:The Triumph of Steel
演唱:Manowar
年份:
发行:The K-Pops Singers
____________________
品名:The Triumph of Steel
演唱:Manowar
年份:
发行:The K-Pops Singers
____________________
品名:Believe
演唱:Justin Bieber
年份:
发行:Columbia Records
____________________
品名:Believe
演唱:Justin Bieber
年份:
发行:Columbia Records
____________________
品名:Believe
演唱:Justin Bieber
年份:
发行:Columbia Records Process finished with exit code
FunDA的另一个并行运算需求是并行对一长串数据元素进行一个函数的施用。先看看这个函数的款式:
//作业类型
type FDATask[ROW] = ROW => Option[List[ROW]]
也就是我们前面使用过的,由用户提供的那个作业函数类型。但是再看看fda_runPar函数,只能对下面这种类型进行并行运算:
def fda_runPar(parTask: FDAParTask)(maxOpen: Int) =
concurrent.join(maxOpen)(parTask).through(fda_afterPar) //并行作业类型
type FDAParTask = Stream[Task,Stream[Task,Option[List[FDAROW]]]]
我们首先必须把Stream[Task,A]转成Stream[Task,Stream[Task,A]]:
implicit class toFDAOps(fs2Stream: FDAPipeLine[FDAROW]) {
def appendTask(t: FDATask[FDAROW]) = fs2Stream.through(fda_execUserTask(t))
def startRun = fs2Stream.run.unsafeRun
def startFuture = fs2Stream.run.unsafeRunAsyncFuture
def toPar(st: FDATask[FDAROW]): Stream[Task, Stream[Task, Option[List[FDAROW]]]] =
fs2Stream.map { row =>
Stream.eval(Task {
st(row)
})
}
}
我们可以用toPar来实现并行运算类型转换。下面是一个调用例子:
//并行作业函数
def updateYear: FDATask[FDAROW] = row => {
row match {
case album: Album =>
val action = albums.filter{r => r.title === album.title}.map(_.year).update(Some())
//把原数据和新构建的Action一起传下去
fda_next(List(album,FDAActionRow(action)))
case others@ _ => fda_next(others)
}
} //并行读取
val s1 = fda_par_load(albumStream1,albumStream1,albumStream1)()
//并行构建Action
val s2 = fda_runPar(s1.toPar(updateYear))()
s1是并行构建的数据源,s2是对数据源产生的元素进行并行的函数updateYear施用。我们同样可以把产生的ActionRow用并行的方法来运算:
val runner = FDAActionRunner(slick.driver.H2Driver)
//并行运算函数
def runActions: FDATask[FDAROW] = row => {
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case others@ _ => fda_next(others)
}
} //并行运算Action
val s3 = fda_runPar(s2.toPar(runActions))()
//开始运算
s3.appendTask(printAlbums).startRun
从上面的例子里应该能够体会到函数式编程的灵活性:在startRun之前,我们可以任意进行函数组合,而且静态类型系统(static type system)会帮我们检查各组件的类型是否匹配。下面是具体运算结果显示:
*** (c.z.hikari.HikariDataSource) HikariCP pool h2db is starting.
*** (s.jdbc.JdbcBackend.statement) Preparing statement: select x2."TITLE", x2."ARTIST", x2."YEAR", x3."NAME" from "ALBUMS" x2, "COMPANY" x3 where x2."COMPANY" = x3."ID"
*** (s.jdbc.JdbcBackend.statement) Preparing statement: select x2."TITLE", x2."ARTIST", x2."YEAR", x3."NAME" from "ALBUMS" x2, "COMPANY" x3 where x2."COMPANY" = x3."ID"
*** (s.jdbc.JdbcBackend.statement) Preparing statement: select x2."TITLE", x2."ARTIST", x2."YEAR", x3."NAME" from "ALBUMS" x2, "COMPANY" x3 where x2."COMPANY" = x3."ID"
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Keyboard Cat''s Greatest Hits'
____________________
品名:Keyboard Cat's Greatest Hits
演唱:Keyboard Cat
年份:
发行:Sony Music Inc
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Keyboard Cat''s Greatest Hits'
____________________
品名:Keyboard Cat's Greatest Hits
演唱:Keyboard Cat
年份:
发行:Sony Music Inc
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Keyboard Cat''s Greatest Hits'
____________________
品名:Keyboard Cat's Greatest Hits
演唱:Keyboard Cat
年份:
发行:Sony Music Inc
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Spice'
____________________
品名:Spice
演唱:Spice Girls
年份:
发行:Columbia Records
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Spice'
____________________
品名:Spice
演唱:Spice Girls
年份:
发行:Columbia Records
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Spice'
____________________
品名:Spice
演唱:Spice Girls
年份:
发行:Columbia Records
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Whenever You Need Somebody'
____________________
品名:Whenever You Need Somebody
演唱:Rick Astley
年份:
发行:Sony Music Inc
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Whenever You Need Somebody'
____________________
品名:Whenever You Need Somebody
演唱:Rick Astley
年份:
发行:Sony Music Inc
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Whenever You Need Somebody'
____________________
品名:Whenever You Need Somebody
演唱:Rick Astley
年份:
发行:Sony Music Inc
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'The Triumph of Steel'
____________________
品名:The Triumph of Steel
演唱:Manowar
年份:
发行:The K-Pops Singers
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'The Triumph of Steel'
____________________
品名:The Triumph of Steel
演唱:Manowar
年份:
发行:The K-Pops Singers
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'The Triumph of Steel'
____________________
品名:The Triumph of Steel
演唱:Manowar
年份:
发行:The K-Pops Singers
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Believe'
____________________
品名:Believe
演唱:Justin Bieber
年份:
发行:Columbia Records
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Believe'
____________________
品名:Believe
演唱:Justin Bieber
年份:
发行:Columbia Records
*** (s.jdbc.JdbcBackend.statement) Preparing statement: update "ALBUMS" set "YEAR" = ? where "ALBUMS"."TITLE" = 'Believe'
____________________
品名:Believe
演唱:Justin Bieber
年份:
发行:Columbia Records Process finished with exit code
注意:上面这个例子是存粹做出来作为函数调用示范的,不做任何逻辑和应用上的考虑。下面是本篇讨论的示范源代码:
package com.bayakala.funda.fdapars.examples
import slick.driver.H2Driver.api._
import com.bayakala.funda.samples._
import com.bayakala.funda.fdarows.FDARowTypes._
import com.bayakala.funda.fdarows.FDAROW
import com.bayakala.funda.fdasources.FDADataStream._ import scala.concurrent.duration._
import com.bayakala.funda.fdapipes._
import FDAValves._
import com.bayakala.funda.fdapars.FDAPars._
import com.bayakala.funda.fdarows.FDARowTypes.FDAActionRow
object Example1 extends App {
val albums = SlickModels.albums
val companies = SlickModels.companies //数据源query
val albumsInfo = for {
(a,c) <- albums join companies on (_.company === _.id)
} yield (a.title,a.artist,a.year,c.name) //query结果强类型(用户提供)
case class Album(title: String, artist: String, year: Int, publisher: String) extends FDAROW
//转换函数(用户提供)
def toTypedRow(row: (String, String, Option[Int], String)): Album =
Album(row._1, row._2, row._3.getOrElse(), row._4) val db = Database.forConfig("h2db") val streamLoader = FDAStreamLoader(slick.driver.H2Driver, toTypedRow _)
val albumStream1 = streamLoader.fda_typedStream(albumsInfo.result)(db)(.minutes, , )()()
val albumStream2 = streamLoader.fda_typedStream(albumsInfo.result)(db)(.minutes, , )()()
val albumStream3 = streamLoader.fda_typedStream(albumsInfo.result)(db)(.minutes, , )()() def printAlbums: FDATask[FDAROW] = row => {
row match {
case album: Album =>
println("____________________")
println(s"品名:${album.title}")
println(s"演唱:${album.artist}")
println(s"年份:${album.year}")
println(s"发行:${album.publisher}") fda_skip
// fda_next(album)
case r@_ => fda_next(r)
}
} // fda_par_load(albumStream1,albumStream1,albumStream1)(3).appendTask(printAlbums).startRun //并行作业函数
def updateYear: FDATask[FDAROW] = row => {
row match {
case album: Album =>
val action = albums.filter{r => r.title === album.title}.map(_.year).update(Some())
//把原数据和新构建的Action一起传下去
fda_next(List(album,FDAActionRow(action)))
case others@ _ => fda_next(others)
}
} val runner = FDAActionRunner(slick.driver.H2Driver)
//并行运算函数
def runActions: FDATask[FDAROW] = row => {
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case others@ _ => fda_next(others)
}
}
//并行读取
val s1 = fda_par_load(albumStream1,albumStream1,albumStream1)()
//并行构建Action
val s2 = fda_runPar(s1.toPar(updateYear))() //并行运算Action
val s3 = fda_runPar(s2.toPar(runActions))()
//开始运算
s3.appendTask(printAlbums).startRun }
FunDA(11)- 数据库操作的并行运算:Parallel data processing的更多相关文章
- Spring Boot快速入门(四):使用jpa进行数据库操作
原文地址:https://lierabbit.cn/articles/5 添加依赖 新建项目选择web,JPA,MySQL三个依赖 对于已存在的项目可以在bulid.gradle加入,spring b ...
- Java 8 实战 P2 Functional-style data processing
目录 Chapter 4. Introducing streams Chapter 5. Working with streams Chapter 6. Collecting data with st ...
- Java Spring Boot VS .NetCore (四)数据库操作 Spring Data JPA vs EFCore
Java Spring Boot VS .NetCore (一)来一个简单的 Hello World Java Spring Boot VS .NetCore (二)实现一个过滤器Filter Jav ...
- Django1.11模型类数据库操作
django模型类数据库操作 数据库操作 添加数据 1,创建类对象,属性赋值添加 book= BookInfo(name='jack',pub_date='2010-1-1') book.save() ...
- C# 4.0 新特性之并行运算(Parallel)
介绍C# 4.0 的新特性之并行运算 Parallel.For - for 循环的并行运算 Parallel.ForEach - foreach 循环的并行运算 Parallel.Invoke - 并 ...
- 转载 精进不休 .NET 4.0 (5) - C# 4.0 新特性之并行运算(Parallel) https://www.cnblogs.com/webabcd/archive/2010/06/03/1750449.html
精进不休 .NET 4.0 (5) - C# 4.0 新特性之并行运算(Parallel) 介绍C# 4.0 的新特性之并行运算 Parallel.For - for 循环的并行运算 Parall ...
- django数据库操作和中间件
数据库配置 django的数据库相关表配置在models.py文件中,数据库的连接相关信息配置在settings.py中 models.py相关相关参数配置 from django.db import ...
- phpcms v9 中的数据库操作函数
1.查询 $this->select($where = '', $data = '*', $limit = '', $order = '', $group = '', $key='') 返回 ...
- PHP数据库操作:使用ORM
之前我发了一篇博文PHP数据库操作:从MySQL原生API到PDO,向大家展示PHP是如何使用MySQL原生API.MySQLi面向过程.MySQLi面向对象.PDO操作MySQL数据库的.本文介绍如 ...
随机推荐
- iOS.NSString.pitfall-in-using-nsstring
1. NSString的使用 在CodeReview中, 发现类似以下代码, 表示深深受伤了: NSString* fString = [NSString stringWithFormat:@&quo ...
- Vue.js2 + Laravel5 采用 CORS 方式解决 AJAX 跨域的问题
一.建立中间件 php artisan make:middleware CorsAjax 二.编写中间件 CorsAjax <?phpnamespace App\Http\Middleware; ...
- 2018.10.15 loj#6010. 「网络流 24 题」数字梯形(费用流)
传送门 费用流经典题. 按照题目要求建边. 为了方便我将所有格子拆点,三种情况下容量分别为111,infinfinf,infinfinf,费用都为validi,jval_{id_{i,j}}valid ...
- 2018.09.07 bzoj1096: [ZJOI2007]仓库建设(斜率优化dp)
传送门 斜率优化dp经典题. 令f[i]表示i这个地方修建仓库的最优值,那么答案就是f[n]. 用dis[i]表示i到1的距离,sump[i]表示1~i所有工厂的p之和,sum[i]表示1~i所有工厂 ...
- hadoop学习笔记(一):概念和组成
一.什么是hadoop Apache Hadoop是一款支持数据密集型分布式应用并以Apache 2.0许可协议发布的开源软件框架.它支持在商品硬件构建的大型集群上运行的应用程序.Hadoop是根据G ...
- gj8 元类编程
8.1 property动态属性 from datetime import date, datetime class User: def __init__(self, name, birthday): ...
- Freetype字体引擎分析与指南
Freetype字体引擎分析与指南,很不错的一篇教程,推荐!!
- bootstrap 问题
less; sass: css预处理:可以直接使用.css,也可以修改.less,生成定制化的css CDN: 服务,使用这个效果会更好.theme一般不引入,jquery一般在js之前引入. 使用b ...
- 『IOS』 遇到问题记录(长期更新)
遇到的很多问题,解决后都是自己记着,以为不会忘记,之后却会想不起来了. 所以把今后解决的问题记录在这. 一. 在二级页面设置了CAlayer的代理,在返回一级页面报错: EXC_BAD_ACCESS( ...
- Thread in depth 3:Synchronization
Synchronization means multi threads access the same resource (data, variable ,etc) should not cause ...