FunDA(15)- 示范:任务并行运算 - user task parallel execution
FunDA的并行运算施用就是对用户自定义函数的并行运算。原理上就是把一个输入流截分成多个输入流并行地输入到一个自定义函数的多个运行实例。这些函数运行实例同时在各自不同的线程里同步运算直至耗尽所有输入。并行运算的具体函数实例数是用fs2-nondeterminism的算法根据CPU内核数、线程池配置和用户指定的最大运算实例数来决定的。我们在这次示范里可以对比一下同样工作内容的并行运算和串形运算效率。在前面示范里我们获取了一个AQMRPT表。但这个表不够合理化(normalized):state和county还没有实现编码与STATES和COUNTIES表的连接。在这次示范里我们就创建一个新表NORMAQM,把AQMRPT表内数据都搬进来。并在这个过程中把STATENAME和COUNTYNAME字段转换成STATES和COUNTIES表的id字段。下面就是NORMAQM表结构:
case class NORMAQMModel(rid: Long
, mid: Int
, state: Int
, county: Int
, year: Int
, value: Int
, average: Int
) extends FDAROW class NORMAQMTable(tag: Tag) extends Table[NORMAQMModel](tag, "NORMAQM") {
def rid = column[Long]("ROWID",O.AutoInc,O.PrimaryKey)
def mid = column[Int]("MEASUREID")
def state = column[Int]("STATID")
def county = column[Int]("COUNTYID")
def year = column[Int]("REPORTYEAR")
def value = column[Int]("VALUE")
def average = column[Int]("AVG") def * = (rid,mid,state,county,year,value,average) <> (NORMAQMModel.tupled, NORMAQMModel.unapply)
} val NORMAQMQuery = TableQuery[NORMAQMTable]
下面是这个表的初始化铺垫代码:
val db = Database.forConfig("h2db")
//drop original table schema
val futVectorTables = db.run(MTable.getTables)
val futDropTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.drop)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf)
//create new table to refine AQMRawTable
val actionCreateTable = Models.NORMAQMQuery.schema.create
val futCreateTable = db.run(actionCreateTable).andThen {
case Success(_) => println("Table created successfully!")
case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
}
//would carry on even fail to create table
Await.ready(futCreateTable,Duration.Inf)
//truncate data, only available in slick 3.2.1
val futTruncateTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.truncate)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf)
我们需要设计一个函数从STATES表里用AQMRPT表的STATENAME查询ID。我故意把这个函数设计成一个完整的FunDA程序。这样可以模拟一个比较消耗io和计算资源的独立过程(不要理会任何合理性,目标是增加io和运算消耗):
//a conceived task for the purpose of resource consumption
//getting id with corresponding name from STATES table
def getStateID(state: String): Int = {
//create a stream for state id with state name
implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case StateModel(stid,stname) => //target row type
if (stname.contains(state)) {
id = stid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
stateStream.appendTask(getid).startRun
id
}
可以看到getStateID函数每次运算都重复构建stateStream。这样可以达到增加io操作的目的。
同样,我们也需要设计另一个函数来从COUNTIES表里获取id字段:
//another conceived task for the purpose of resource consumption
//getting id with corresponding names from COUNTIES table
def getCountyID(state: String, county: String): Int = {
//create a stream for county id with state name and county name
implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val countyStream = fda_staticSource(countySeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case CountyModel(cid,cname) => //target row type
if (cname.contains(state) && cname.contains(county)) {
id = cid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
countyStream.appendTask(getid).startRun
id
}
我们可以如下这样获取这个程序的数据源:
//original table listing
implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
AQMRPTModel(row.rid,row.mid,row.state,row.county,row.year,row.value,row.total,row.valid)
val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
val AQMRPTStream = AQMRPTLoader.fda_typedStream(AQMRPTQuery.result)(db)(,)()
按照正常的FunDA流程我们设计了两个用户自定义函数:一个根据数据行内的state和county字段调用函数getStateID和getCountyID获取相应id后构建一条新的NORMAQM表插入指令行,然后传给下个自定义函数。下个自定义函数就直接运算收到的动作行:
def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
row match {
case aqm: AQMRPTModel =>
if (aqm.valid) {
val stateId = getStateID(aqm.state)
val countyId = getCountyID(aqm.state,aqm.county)
val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
fda_next(FDAActionRow(action))
}
else fda_skip
case _ => fda_skip
}
}
val runner = FDAActionRunner(slick.jdbc.H2Profile)
def runInsertAction: FDAUserTask[FDAROW] = row =>
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case _ => fda_skip
}
像前面几篇示范那样我们把这两个用户自定义函数与数据源组合起来成为完整的FunDA程序后startRun就可以得到实际效果了:
AQMRPTStream.take()
.appendTask(getIdsThenInsertAction)
.appendTask(runInsertAction)
.startRun
这个程序运算了579秒,不过这是个单一线程运算。我们想知道并行运算结果。那么我们首先要把这个getIdsThenInsertAction转成一个并行运算函数FDAParTask:
AQMRPTStream.toPar(getIdsThenInsertAction)
FunDA提供了并行运算器fda_runPar:
implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
fda_runPar(AQMRPTStream.take().toPar(getIdsThenInsertAction))() //max 8 open computations
.appendTask(runInsertAction)
.startRun
我们可以自定义线程池。fda_runPar返回标准的FunDA FDAPipeLine,所以我们可以在后面挂上runInsertAction函数。下面是不同行数的运算时间对比结果:
//processing 10000 rows in a single thread in 570 seconds
// processing 10000 rows parallelly in 316 seconds //processing 20000 rows in a single thread in 1090 seconds
//processing 20000 rows parallelly in 614 seconds //processing 100000 rows in a single thread in 2+ hrs
//processing 100000 rows parallelly in 3885 seconds
可以得出,并行运算对越大数据集有更大的效率提高。下面就是这次示范的源代码:
import slick.jdbc.meta._
import com.bayakala.funda._
import api._
import scala.language.implicitConversions
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
import scala.util.{Failure, Success}
import slick.jdbc.H2Profile.api._
import Models._
import fs2.Strategy object ParallelTasks extends App { val db = Database.forConfig("h2db") //drop original table schema
val futVectorTables = db.run(MTable.getTables) val futDropTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.drop)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //create new table to refine AQMRawTable
val actionCreateTable = Models.NORMAQMQuery.schema.create
val futCreateTable = db.run(actionCreateTable).andThen {
case Success(_) => println("Table created successfully!")
case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
}
//would carry on even fail to create table
Await.ready(futCreateTable,Duration.Inf) //truncate data, only available in slick 3.2.1
val futTruncateTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.truncate)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //a conceived task for the purpose of resource consumption
//getting id with corresponding name from STATES table
def getStateID(state: String): Int = {
//create a stream for state id with state name
implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case StateModel(stid,stname) => //target row type
if (stname.contains(state)) {
id = stid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
stateStream.appendTask(getid).startRun
id
}
//another conceived task for the purpose of resource consumption
//getting id with corresponding names from COUNTIES table
def getCountyID(state: String, county: String): Int = {
//create a stream for county id with state name and county name
implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val countyStream = fda_staticSource(countySeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case CountyModel(cid,cname) => //target row type
if (cname.contains(state) && cname.contains(county)) {
id = cid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
countyStream.appendTask(getid).startRun
id
} //original table listing
implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
AQMRPTModel(row.rid,row.mid,row.state,row.county,row.year,row.value,row.total,row.valid)
val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
val AQMRPTStream = AQMRPTLoader.fda_typedStream(AQMRPTQuery.result)(db)(,)() def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
row match {
case aqm: AQMRPTModel =>
if (aqm.valid) {
val stateId = getStateID(aqm.state)
val countyId = getCountyID(aqm.state,aqm.county)
val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
fda_next(FDAActionRow(action))
}
else fda_skip
case _ => fda_skip
}
}
val runner = FDAActionRunner(slick.jdbc.H2Profile)
def runInsertAction: FDAUserTask[FDAROW] = row =>
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case _ => fda_skip
} val cnt_start = System.currentTimeMillis()
/*
AQMRPTStream.take()
.appendTask(getIdsThenInsertAction)
.appendTask(runInsertAction)
.startRun
//println(s"processing 10000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 10000 rows in a single thread in 570 seconds
//println(s"processing 20000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 20000 rows in a single thread in 1090 seconds
//println(s"processing 100000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 100000 rows in a single thread in 2+ hrs implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
fda_runPar(AQMRPTStream.take().toPar(getIdsThenInsertAction))()
.appendTask(runInsertAction)
.startRun //println(s"processing 10000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
// processing 10000 rows parallelly in 316 seconds
//println(s"processing 20000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 20000 rows parallelly in 614 seconds
println(s"processing 100000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 100000 rows parallelly in 3885 seconds }
FunDA(15)- 示范:任务并行运算 - user task parallel execution的更多相关文章
- Winform Global exception and task parallel library exception;
static class Program { /// <summary> /// 应用程序的主入口点. /// </summary> [STAThread] static vo ...
- C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是TAP(Task-based Asynchronous Pattern, 基于任务的异步模式)
学习书籍: <C#本质论> 1--C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是 ...
- Using the Task Parallel Library (TPL) for Events
Using the Task Parallel Library (TPL) for Events The parallel tasks library was introduced with the ...
- TPL(Task Parallel Library)多线程、并发功能
The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System ...
- Task Parallel Library01,基本用法
我们知道,每个应用程序就是一个进程,一个进程有多个线程.Task Parallel Library为我们的异步编程.多线程编程提供了强有力的支持,它允许一个主线程运行的同时,另外的一些线程或Task也 ...
- System and method for parallel execution of memory transactions using multiple memory models, including SSO, TSO, PSO and RMO
A data processor supports the use of multiple memory models by computer programs. At a device extern ...
- CMU Database Systems - Parallel Execution
并发执行,主要为了增大吞吐,降低延迟,提高数据库的可用性 先区分一组概念,parallel和distributed的区别 总的来说,parallel是指在物理上很近的节点,比如本机的多个线程或进程,不 ...
- FunDA(14)- 示范:并行运算,并行数据库读取 - parallel data loading
FunDA的并行数据库读取功能是指在多个线程中同时对多个独立的数据源进行读取.这些独立的数据源可以是在不同服务器上的数据库表,又或者把一个数据库表分成几个独立部分形成的独立数据源.当然,并行读取的最终 ...
- 异步和多线程,委托异步调用,Thread,ThreadPool,Task,Parallel,CancellationTokenSource
1 进程-线程-多线程,同步和异步2 异步使用和回调3 异步参数4 异步等待5 异步返回值 5 多线程的特点:不卡主线程.速度快.无序性7 thread:线程等待,回调,前台线程/后台线程, 8 th ...
随机推荐
- [BAT] xcopy拷贝远程服务器共享文件到本地
net use * /del /yes NET USE Y: \\10.86.17.243\d$ Autotest123 /user:MSDOMAIN1\doautotester set source ...
- [BAT]win7下用批处理脚本自动删除7天以前创建的文件
set JmeterPath=D:\apache-jmeter-2.7 forfiles /p %JmeterPath%\extras /m *.html -d -7 /c "cmd /c ...
- CSV 文件
CSV 文件 CSV(Comma Separated Values 逗号分隔值) 是一种文件格式(如.txt..doc等),也可理解 .csv 文件就是一种特殊格式的纯文本文件.即是一组字符序列,字符 ...
- EPLAN 软件平台中的词“点“大全
1. 中断点(Interruption Point): 在原理图绘制时,如果当前绘图区域的空间不足,需要转到其它页面继续绘制,而这两页之间存在连续的“信息流“时,可以使用“中断点“来传递这种“ ...
- ajax序列化表单,再也不用通过data去一个个的传值了
jQuery的serialize()方法通过序列化表单值,创建URL编码文本字符串,我们就可以选择一个或多个表单元素,也可以直接选择form将其序列化 这样,我们就可以把序列化的值传给ajax()作为 ...
- UVaLive 4452 The Ministers' Major Mess (TwoSat)
题意:有 m 个人对 n 个方案投票,每个人最多只能对其中的4个方案投票(其他的相当于弃权),每一票要么支持要么反对.问是否存在一个最终决定,使得每个投票人都有超过一半的建议被采纳,在所有可能的最终决 ...
- 20155327 2016-2017-4 《Java程序设计》第6周学习总结
20155327 2016-2017-4 <Java程序设计>第6周学习总结 教材学习内容总结 理解流与IO 流是一组有顺序的,有起点和终点的字节集合,是对数据传输的总称或抽象.即数据在两 ...
- 下拉菜单 - - css
<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title&g ...
- (字符串处理)Fang Fang -- hdu -- 5455 (2015 ACM/ICPC Asia Regional Shenyang Online)
链接: http://acm.hdu.edu.cn/showproblem.php?pid=5455 Fang Fang Time Limit: 1500/1000 MS (Java/Others) ...
- laravel命令
新建控制器 php artisan make:controller IssuesController 新建控制器并自动生成对应RESTful风格路由相关CURD方法 php artisan make: ...