FunDA(15)- 示范:任务并行运算 - user task parallel execution
FunDA的并行运算施用就是对用户自定义函数的并行运算。原理上就是把一个输入流截分成多个输入流并行地输入到一个自定义函数的多个运行实例。这些函数运行实例同时在各自不同的线程里同步运算直至耗尽所有输入。并行运算的具体函数实例数是用fs2-nondeterminism的算法根据CPU内核数、线程池配置和用户指定的最大运算实例数来决定的。我们在这次示范里可以对比一下同样工作内容的并行运算和串形运算效率。在前面示范里我们获取了一个AQMRPT表。但这个表不够合理化(normalized):state和county还没有实现编码与STATES和COUNTIES表的连接。在这次示范里我们就创建一个新表NORMAQM,把AQMRPT表内数据都搬进来。并在这个过程中把STATENAME和COUNTYNAME字段转换成STATES和COUNTIES表的id字段。下面就是NORMAQM表结构:
case class NORMAQMModel(rid: Long
, mid: Int
, state: Int
, county: Int
, year: Int
, value: Int
, average: Int
) extends FDAROW class NORMAQMTable(tag: Tag) extends Table[NORMAQMModel](tag, "NORMAQM") {
def rid = column[Long]("ROWID",O.AutoInc,O.PrimaryKey)
def mid = column[Int]("MEASUREID")
def state = column[Int]("STATID")
def county = column[Int]("COUNTYID")
def year = column[Int]("REPORTYEAR")
def value = column[Int]("VALUE")
def average = column[Int]("AVG") def * = (rid,mid,state,county,year,value,average) <> (NORMAQMModel.tupled, NORMAQMModel.unapply)
} val NORMAQMQuery = TableQuery[NORMAQMTable]
下面是这个表的初始化铺垫代码:
val db = Database.forConfig("h2db")
//drop original table schema
val futVectorTables = db.run(MTable.getTables)
val futDropTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.drop)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf)
//create new table to refine AQMRawTable
val actionCreateTable = Models.NORMAQMQuery.schema.create
val futCreateTable = db.run(actionCreateTable).andThen {
case Success(_) => println("Table created successfully!")
case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
}
//would carry on even fail to create table
Await.ready(futCreateTable,Duration.Inf)
//truncate data, only available in slick 3.2.1
val futTruncateTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.truncate)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf)
我们需要设计一个函数从STATES表里用AQMRPT表的STATENAME查询ID。我故意把这个函数设计成一个完整的FunDA程序。这样可以模拟一个比较消耗io和计算资源的独立过程(不要理会任何合理性,目标是增加io和运算消耗):
//a conceived task for the purpose of resource consumption
//getting id with corresponding name from STATES table
def getStateID(state: String): Int = {
//create a stream for state id with state name
implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case StateModel(stid,stname) => //target row type
if (stname.contains(state)) {
id = stid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
stateStream.appendTask(getid).startRun
id
}
可以看到getStateID函数每次运算都重复构建stateStream。这样可以达到增加io操作的目的。
同样,我们也需要设计另一个函数来从COUNTIES表里获取id字段:
//another conceived task for the purpose of resource consumption
//getting id with corresponding names from COUNTIES table
def getCountyID(state: String, county: String): Int = {
//create a stream for county id with state name and county name
implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val countyStream = fda_staticSource(countySeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case CountyModel(cid,cname) => //target row type
if (cname.contains(state) && cname.contains(county)) {
id = cid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
countyStream.appendTask(getid).startRun
id
}
我们可以如下这样获取这个程序的数据源:
//original table listing
implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
AQMRPTModel(row.rid,row.mid,row.state,row.county,row.year,row.value,row.total,row.valid)
val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
val AQMRPTStream = AQMRPTLoader.fda_typedStream(AQMRPTQuery.result)(db)(,)()
按照正常的FunDA流程我们设计了两个用户自定义函数:一个根据数据行内的state和county字段调用函数getStateID和getCountyID获取相应id后构建一条新的NORMAQM表插入指令行,然后传给下个自定义函数。下个自定义函数就直接运算收到的动作行:
def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
row match {
case aqm: AQMRPTModel =>
if (aqm.valid) {
val stateId = getStateID(aqm.state)
val countyId = getCountyID(aqm.state,aqm.county)
val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
fda_next(FDAActionRow(action))
}
else fda_skip
case _ => fda_skip
}
}
val runner = FDAActionRunner(slick.jdbc.H2Profile)
def runInsertAction: FDAUserTask[FDAROW] = row =>
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case _ => fda_skip
}
像前面几篇示范那样我们把这两个用户自定义函数与数据源组合起来成为完整的FunDA程序后startRun就可以得到实际效果了:
AQMRPTStream.take()
.appendTask(getIdsThenInsertAction)
.appendTask(runInsertAction)
.startRun
这个程序运算了579秒,不过这是个单一线程运算。我们想知道并行运算结果。那么我们首先要把这个getIdsThenInsertAction转成一个并行运算函数FDAParTask:
AQMRPTStream.toPar(getIdsThenInsertAction)
FunDA提供了并行运算器fda_runPar:
implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
fda_runPar(AQMRPTStream.take().toPar(getIdsThenInsertAction))() //max 8 open computations
.appendTask(runInsertAction)
.startRun
我们可以自定义线程池。fda_runPar返回标准的FunDA FDAPipeLine,所以我们可以在后面挂上runInsertAction函数。下面是不同行数的运算时间对比结果:
//processing 10000 rows in a single thread in 570 seconds
// processing 10000 rows parallelly in 316 seconds //processing 20000 rows in a single thread in 1090 seconds
//processing 20000 rows parallelly in 614 seconds //processing 100000 rows in a single thread in 2+ hrs
//processing 100000 rows parallelly in 3885 seconds
可以得出,并行运算对越大数据集有更大的效率提高。下面就是这次示范的源代码:
import slick.jdbc.meta._
import com.bayakala.funda._
import api._
import scala.language.implicitConversions
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
import scala.util.{Failure, Success}
import slick.jdbc.H2Profile.api._
import Models._
import fs2.Strategy object ParallelTasks extends App { val db = Database.forConfig("h2db") //drop original table schema
val futVectorTables = db.run(MTable.getTables) val futDropTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.drop)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //create new table to refine AQMRawTable
val actionCreateTable = Models.NORMAQMQuery.schema.create
val futCreateTable = db.run(actionCreateTable).andThen {
case Success(_) => println("Table created successfully!")
case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
}
//would carry on even fail to create table
Await.ready(futCreateTable,Duration.Inf) //truncate data, only available in slick 3.2.1
val futTruncateTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.truncate)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //a conceived task for the purpose of resource consumption
//getting id with corresponding name from STATES table
def getStateID(state: String): Int = {
//create a stream for state id with state name
implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case StateModel(stid,stname) => //target row type
if (stname.contains(state)) {
id = stid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
stateStream.appendTask(getid).startRun
id
}
//another conceived task for the purpose of resource consumption
//getting id with corresponding names from COUNTIES table
def getCountyID(state: String, county: String): Int = {
//create a stream for county id with state name and county name
implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val countyStream = fda_staticSource(countySeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case CountyModel(cid,cname) => //target row type
if (cname.contains(state) && cname.contains(county)) {
id = cid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
countyStream.appendTask(getid).startRun
id
} //original table listing
implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
AQMRPTModel(row.rid,row.mid,row.state,row.county,row.year,row.value,row.total,row.valid)
val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
val AQMRPTStream = AQMRPTLoader.fda_typedStream(AQMRPTQuery.result)(db)(,)() def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
row match {
case aqm: AQMRPTModel =>
if (aqm.valid) {
val stateId = getStateID(aqm.state)
val countyId = getCountyID(aqm.state,aqm.county)
val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
fda_next(FDAActionRow(action))
}
else fda_skip
case _ => fda_skip
}
}
val runner = FDAActionRunner(slick.jdbc.H2Profile)
def runInsertAction: FDAUserTask[FDAROW] = row =>
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case _ => fda_skip
} val cnt_start = System.currentTimeMillis()
/*
AQMRPTStream.take()
.appendTask(getIdsThenInsertAction)
.appendTask(runInsertAction)
.startRun
//println(s"processing 10000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 10000 rows in a single thread in 570 seconds
//println(s"processing 20000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 20000 rows in a single thread in 1090 seconds
//println(s"processing 100000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 100000 rows in a single thread in 2+ hrs implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
fda_runPar(AQMRPTStream.take().toPar(getIdsThenInsertAction))()
.appendTask(runInsertAction)
.startRun //println(s"processing 10000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
// processing 10000 rows parallelly in 316 seconds
//println(s"processing 20000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 20000 rows parallelly in 614 seconds
println(s"processing 100000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 100000 rows parallelly in 3885 seconds }
FunDA(15)- 示范:任务并行运算 - user task parallel execution的更多相关文章
- Winform Global exception and task parallel library exception;
static class Program { /// <summary> /// 应用程序的主入口点. /// </summary> [STAThread] static vo ...
- C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是TAP(Task-based Asynchronous Pattern, 基于任务的异步模式)
学习书籍: <C#本质论> 1--C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是 ...
- Using the Task Parallel Library (TPL) for Events
Using the Task Parallel Library (TPL) for Events The parallel tasks library was introduced with the ...
- TPL(Task Parallel Library)多线程、并发功能
The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System ...
- Task Parallel Library01,基本用法
我们知道,每个应用程序就是一个进程,一个进程有多个线程.Task Parallel Library为我们的异步编程.多线程编程提供了强有力的支持,它允许一个主线程运行的同时,另外的一些线程或Task也 ...
- System and method for parallel execution of memory transactions using multiple memory models, including SSO, TSO, PSO and RMO
A data processor supports the use of multiple memory models by computer programs. At a device extern ...
- CMU Database Systems - Parallel Execution
并发执行,主要为了增大吞吐,降低延迟,提高数据库的可用性 先区分一组概念,parallel和distributed的区别 总的来说,parallel是指在物理上很近的节点,比如本机的多个线程或进程,不 ...
- FunDA(14)- 示范:并行运算,并行数据库读取 - parallel data loading
FunDA的并行数据库读取功能是指在多个线程中同时对多个独立的数据源进行读取.这些独立的数据源可以是在不同服务器上的数据库表,又或者把一个数据库表分成几个独立部分形成的独立数据源.当然,并行读取的最终 ...
- 异步和多线程,委托异步调用,Thread,ThreadPool,Task,Parallel,CancellationTokenSource
1 进程-线程-多线程,同步和异步2 异步使用和回调3 异步参数4 异步等待5 异步返回值 5 多线程的特点:不卡主线程.速度快.无序性7 thread:线程等待,回调,前台线程/后台线程, 8 th ...
随机推荐
- geoserver 开发1
打开项目,会看见下面这些包(其实还有很多插件之类的包,我都删除了) 5)可以从Eclipse启动GeoServer了. 如果你已经安装了GeoServer,现在也可以打开它的登陆页面进行操作. 三 结 ...
- boost基础环境搭建
因为现在手上的老的基类库经常出现丢包,以及从ServiceAClient 发送消息到 ServiceBServer时出现消息失败的情况,以及现有的莫名其妙的内存泄露的问题,以及目前还是c++0x,准确 ...
- Maven系列(二)exec-maven-plugin
Maven系列(二)exec-maven-plugin 1. mvn 命令行运行 # exec:java 不会自动编译代码,你需要手动执行 mvn compile 来完成编译 mvn compile ...
- mysql添加注释
-- 查看字段类型-- show columns from campaign_distribute --给表添加注释 -- alter table campaign_distribute commen ...
- datagrid分页 从后端获取数据也很简单
<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="Datagrid.aspx. ...
- msys2 命令行添加镜像地址
sed -i "1iServer = https://mirrors.tuna.tsinghua.edu.cn/msys2/mingw/i686" /etc/pacman.d/mi ...
- Microsoft DirectX SDK 2010 版本下载
Microsoft DirectX SDK 2010 版本下载 Version:Date Published:9.29.19626/7/2010File name:File size:DXSDK_Ju ...
- ipad The data couldn’t be read because it isn’t in the correct format
原来是land left和land right都勾选的,去掉land left后出现这个问题
- 2018.09.16 spoj104Highways (矩阵树定理)
传送门 第一次写矩阵树定理. 就是度数矩阵减去邻接矩阵之后得到的基尔霍夫矩阵的余子式的行列式值. 这个可以用高斯消元O(n3)" role="presentation" ...
- 2018.08.28 codeforces600E(dsu on tree)
传送门 一道烂大街的dsu on tree板题. 感觉挺有趣的^_^ 代码真心简单啊! 就是先处理轻儿子,然后处理重儿子,其中处理轻儿子后需要手动消除影响. 代码: #include<bits/ ...