Notes of Principles of Parallel Programming - TODO
0.1 Topic
Notes of Lin C., Snyder L.. Principles of Parallel Programming. Beijing: China Machine Press. 2008.
(1) Parallel Computer Architecture - done 2015/5/24
(2) Parallel Abstraction - done 2015/5/28
(3) Scable Algorithm Techniques - done 2015/5/30
(4) PP Languages: Java(Thread), MPI(local view), ZPL(global view)
0.2 Audience
Navie PP programmers who want to gain foundamental PP concepts
0.3 Related Topics
Computer Architecture, Sequential Algorithms, 
PP Programming Languages
--------------------------------------------------------------------
- ###1 introduction
real world cases:
house construction, manufacturing pipeline, call center
ILP(Instruction Level Parallelism)
(a+b) * (c+d)
Parallel Computing V.S. Distributed Computing
the goal of PC is to provide performance, either in terms of 
processor power or memory that a single processor cannot provide;
the goal of DC is to provide convenience, including availability,
realiablity and physical distribution.
Concurrency V.S. Parallelism
CONCURRENCY is widely used in OS and DB communities to describe 
exceutions that are LOGICALLY simultaneous;
PARALLELISM is typically used by the architecture and supercomputing
communities to describe executions that PHYSICALLY execute simultaneoulsy.
In either case, the codes that execute simultaneously exhibit unknown
timing characteristics.
iterative sum/pair-wise summation
parallel prefix sum
Parallelism using multiple instruction streams: thread
multithreaded solutions to count 3's number in an array
good parallel programs' characteristics:
(1) correct;
(2) good performance
(3) scalable to large number of processors
(4) portable across a wide variety to parallel platforms
- ###2 parallel computers
6 parallel computers
(1) Chip multiprocessors *
Intel Core Duo
AMD Dual Core Opteron
(2) Symmetric Multiprocessor Architecture
Sun Fire E25K
(3) Heterogeneous Chip Design
Cell
(4) Clusters
(5) Supercomputers
BlueGene/L
sequential computer abstraction
Random Access Machine(RAM) model, i.e. the von Neumann Model
abstract a sequential computer as a device with an instruction
execution unit and an unbounded memory.
2 abstract models of parallel computers:
(1) PRAM: parallel random access machine model
the PRAM consists of an unspecified number of instruction execution units,
connected to a single unbounded shared memory that contains both
programs and data.
(2) CTA: candidate type architecture
the CTA consists of P standard sequential computers(processors,processor element), 
connected by an interconnection network(communication network);
seperate 2 types of memory references: inexpensive local reference
and expensive non-local reference;
Locality Rule:
Fast programs tend to maximize the number of local memory references, and
minimize the number of non-local memory references.
3 major communication(memory reference) mechanisms:
(1) shared memory
a natural extension of the flat memory of sequential computers.
(2) one-sided communication
a relaxation of the shared memory concepts: support a single shared address space,
all threads can reference all memory location, but it doesn't attempt to keep the
memory coherent.
(3) message passing
memory references are used to access local memory,
message passing is userd to access non-local memory.
- ### 3 reasoning about parallel performance
thread: thread-based/shared memory parallel programming
process: message passing/non-shread memory parallel proframming
latency: the amount of TIME it takes to complete a given unit of work
throughput: the amount of WORK that can be completed per unit time
## source of performance loss
(1) overhead
communication
synchronization
computation
memory
(2) non-parallelizable computation
Amdahl's Law: portions of a computation that are sequential will,
as parallelism is applied, dominate the execution time.
(3) idle processors
idle time is often a consequence of synchronization and communication
load imbalance: uneven distribution of work to processors
memory-bound computaion: bandwidth, lantency
(4) contention for resources
spin lock, false sharing
## parallel structure
(1) dependences
an ordering relationship between two computations
(2) granularity
the frequency of interactions among threads or processes
(3) locality
temporal locality: memory references are clustered in TIME
spatial locality: memory references are clustered by ADDRESS
## performance trade-off
sequential computation: 90/10 rule
communication V.S. Computation
Memory V.S. Parallelism
Overhead V.S. Parallelism
## measuring performance
(1) execution time/latency
(2) speedup/efficiency
(3) superliear speedup
## scable performance *
is difficult to achieve
- ### 4 first step toward parallel programming
## data and task parallelism
(1) data parallel computation
parallelism is applied by performing the SAME operation to different items of data at the same time
(2) task parallel computation
parallelism is applied by performing DISTINCT computations/tasks at the same time
an example: the job of preparing a banquet/dinner
## Peril-L Notation
see handwrite notes
## formulating parallelism
(1) fixed parallelism
k processors, a k-way parallel algorithm
drawback: 2k processors cannot gain any imporvement
(2) unlimited parallelism
spawn a thread for each single data element:
// backgound: count 3's number in array[n]
int _count_ = 0;
forall (i in(0..n-1))//n is the arraysize
{
  _count_ = +/(array[i]==3?1:0);
}
drawback: overhead of setup all threads is n/P, 
where P is the number of processor, and P << n.
(3) scable parallelism
formulate a set of substantial subporblems, natural units of the solution are assigned to each subproblem, each subproblem is solved as independentyly as possible.
implications:
substantial: sufficent local work to cover parallel overheads
natural unit: computation is not always smoothly partitionable
independently: reduce parallel communication overheads
- ### 5 scable alogrithmic techniques
focus on data parallel computations
# ideal parallel computation
composed of large blocks of independent computation with no interactions among blocks.
principle:
  Parallel programs are more scable when they emphasize blocks of computation, typically
  the larger the block the better, that minimize the inter-thread dependences.
## Schwartz's alogrithm
goal: +-reduce
condition: P is number of processors, n is number of values
2 approaches:
(1) use n/2 logicall concurrency - unlimited parallelism
(2) each process handle n/P items locally, then combine using P-leaf tree - better
notation: _total_ = +/ _data_;
where _total_ is a global number, _data_ is a global array
the compiler emit code that use Schwartz's local/global approach.
## reduce and scan abstractions
generalized reduce and scan functions
## assign work to processes statically
## assign work to processes dynamically
## trees
Notes of Principles of Parallel Programming - TODO的更多相关文章
- Notes of Principles of Parallel Programming: Peril-L Notation  - TODO
		Content 1 syntax and semantic 2 example set 1 syntax and semantic 1.1 extending C Peril-L notation s ... 
- Introduction to Multi-Threaded, Multi-Core and Parallel Programming concepts
		https://katyscode.wordpress.com/2013/05/17/introduction-to-multi-threaded-multi-core-and-parallel-pr ... 
- 4.3 Reduction代码(Heterogeneous Parallel Programming class lab)
		首先添加上Heterogeneous Parallel Programming class 中 lab: Reduction的代码: myReduction.c // MP Reduction // ... 
- Task Cancellation: Parallel Programming
		http://beyondrelational.com/modules/2/blogs/79/posts/11524/task-cancellation-parallel-programming-ii ... 
- Samples for Parallel Programming with the .NET Framework
		The .NET Framework 4 includes significant advancements for developers writing parallel and concurren ... 
- Parallel Programming for FPGAs 学习笔记(1)
		Parallel Programming for FPGAs 学习笔记(1) 
- Parallel Programming  AND Asynchronous Programming
		https://blogs.oracle.com/dave/ Java Memory Model...and the pragmatics of itAleksey Shipilevaleksey.s ... 
- 【转载】#229 - The Core Principles of Object-Oriented Programming
		As an object-oriented language, c# supports the three core principles of object-oriented programming ... 
- Fork and Join: Java Can Excel at Painless Parallel Programming Too!---转
		原文地址:http://www.oracle.com/technetwork/articles/java/fork-join-422606.html Multicore processors are ... 
随机推荐
- linux安装svn服务端不使用apache
			一.安装 1.查看是否安装cvs rpm -qa | grep subversion 2.安装 yum install subversion 3.测试是否安装成功 /usr/bin/svnserve ... 
- html,body最顶层元素.
			1,元素百比分是相对父元素,所有元素默认父元素是body. absolute,fixed[只有一个父元素,浏览器窗口]除外[浏览器窗口,为父元素].css3:vh,vw也永远相对,浏览器窗口.heig ... 
- (DFS)hdoj1175:连连看
			题目链接 这道题被稍微改编当作过去年的期末上机题,也被直接放到了这次这一届的第二次练习赛.当初刚看到这道题时DFS并没有系统的学过,做起来极其费劲.现在学过之后开始实践练习,发现这道题真的是很水. 我 ... 
- JSTL标准标签库
			有时使用EL和标准动作达不到目的,于是就引入定制标记. 对于JSP页面创作人员来说,定制标记使用起来比脚本要容易一些.不过对于JAVA程序员来说,简历定制标记处理器反而更困难.幸运的是,已经有了一个标 ... 
- (转载)重新对APK文件签名
			1.将证书(debug.keystore)复制到与需要重新签名的apk文件相同的目录下(如:复制到D:\Sign) 2.在cmd中切换到需要重新签名的apk文件的目录下 3.使用WinRAR打开要重新 ... 
- LightOJ 1047-Program C
			Description The people of Mohammadpur have decided to paint each of their houses red, green, or blue ... 
- Android 动画特效
			一.渐变动画 AlphaAnimation aa = new AlphaAnimation(0.3f, 1.0f); // fromAlpha , toAlpha aa.setDuration(200 ... 
- SharePoint 2013 开发——开发并部署Provider-hosted APP
			博客地址:http://blog.csdn.net/FoxDave 本篇我们用Visual Studio创建并部署一个SharePoint Provider-hosted应用程序. 打开Visua ... 
- hibernate缓存和提高效率
			1.使用二级缓存,多把大批量的.短期多次的查询数据存到二级缓存中,避免和数据库的多次交互,增加负担.二级缓存加在那些增删改少的,查询多的类中.二级缓存的是对象,如果查出来的不是对象,不会放到缓存中去. ... 
- ASP.NET读取EXCEL文件的三种经典方法(转)
			1.方法一:采用OleDB读取EXCEL文件: 把EXCEL文件当做一个数据源来进行数据的读取操作,实例如下:public DataSet ExcelToDS(string Path) { str ... 
