General mistakes in parallel computing
这是2013年写的一篇旧文,放在gegahost.net上面 http://raison.gegahost.net/?p=97
March 11, 2013
General mistakes in parallel computing
(Original Work by Peixu Zhu)
In parallel computing environment, some general mistakes are frequent and difficult to shoot, caused by random CPU sequence in different thread contexts. Most of them are atomic violation, order violation, and dead lock. Studies show that some famous software also have such mistakes, like MySQL, Apache, Mozilla, and OpenOffice.
1. Atomic violation
In sequent programming, we seldom care the atomic operation, however, in parallel programming, we must remember atomic operations at first. for example:
[Thread 1]
if (_ptr) // A
*_ptr = 0; // B
[Thread 2]
_ptr = NULL; // C
For above code, there’s one statement to be executed in thread 1 and
thread 2 respectively, it seems that it should be running the statement
in thread 1 or thread 2, they should not be interlaced. But, in fact,
statement in thread 1 is not atomic, at least, it can
be divided into step A and B, thus, if it is arranged to execute in
order of A-B-C, it is okay, however, it is also possible be scheduled to
run as A-C-B, this will bring an unexpected memory access error.
We assume that the statement region in thread 1 is atomic, but it is
not true. This is the root of the atomic violation. In many cases, the
problem is caused by code modification, for above example, the statement
in thread 1 may be a simple assignment statement at first:
_ptr = &_val;
And later, the code is modified, and the implicit atomicity is broken.
For systems with multiple cores, the problem will be more
complicated, since each core may cache a block of memory respectively.
For example, core 1 runs thread 1, and core 2 runs thread 2:
[Thread 1]
_ptr = &_val;
[Thread 2]
_ptr = NULL;
Are they atomic ? No, they are not in fact. the `_ptr` may be
optimized to be register value in one core locally, or it is cached in
different core. Thus, the we can not determine the value of `_ptr`.
To avoid atomic violation, we must make the code region atomic, by
locking or atomic operations. Explicit atomic operations on a shared
variable is a good habit, since we are noticed by the statement that it
is atomicity demanded when we try to modify the code.
2. Order violation
Considering below example:
[Thread 1]
_ptr = allocate_memory(); // A
[Thread 2]
_ptr[1] = "right"; // B
If the code is not synchronized, execution order of A-B or B-A are
all possible. In such cases, we must synchronize the code block to
ensure the order of execution.
3. Dead lock
Locking is elemental in concurrent programming. If there’s more than
one threads working with more than with one shared resource, such as
memory block, it is possible that each thread owning a resource is
waiting for each others resource.
[Thread 1]
lock_a.lock();
a = 0; // A
lock_b.lock();
b = 0; // B
lock_b.unlock();
lock_a.unlock();
[Thread 2]
lock_b.lock();
b = 1; // C
lock_a.lock();
a = 1; // D
lock_a.unlock();
lock_b.unlock();
if the code is running as A-B-C-D, there’s no problem, however, if it
is running as A-C-B-D, there’s dead lock. Dead locking requires four
conditions:
a. mutex exclusion
b. hold and wait
c. no preemption
d. circular waiting
Breaking at least one of above four condition will break the dead locking.
General mistakes in parallel computing的更多相关文章
- Introduction to Parallel Computing
Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...
- Method and apparatus for an atomic operation in a parallel computing environment
A method and apparatus for a atomic operation is described. A method comprises receiving a first pro ...
- PatentTips - Safe general purpose virtual machine computing system
BACKGROUND OF THE INVENTION The present invention relates to virtual machine implementations, and in ...
- STROME --realtime & online parallel computing
Data Collections ---> Stream to Channel (as source input) ----> Parallel Computing---> Resu ...
- Parallel Computing–Cannon算法 (MPI 实现)
原理不解释,直接上代码 代码中被注释的源程序可用于打印中间结果,检查运算是否正确. #include "mpi.h" #include <math.h> #includ ...
- Distributed and Parallel Computing
Omega Network Model
- How-to go parallel in R – basics + tips(转)
Today is a good day to start parallelizing your code. I’ve been using the parallel package since its ...
- Parallel Gradient Boosting Decision Trees
本文转载自:链接 Highlights Three different methods for parallel gradient boosting decision trees. My algori ...
- Massively parallel supercomputer
A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures ba ...
随机推荐
- UVA - 10004 Bicoloring(判断二分图——交叉染色法 / 带权并查集)
d.给定一个图,判断是不是二分图. s.可以交叉染色,就是二分图:否则,不是. 另外,此题中的图是强连通图,即任意两点可达,从而dfs方法从一个点出发就能遍历整个图了. 如果不能保证从一个点出发可以遍 ...
- BZOJ1499 单调队列+DP
1499: [NOI2005]瑰丽华尔兹 Time Limit: 3 Sec Memory Limit: 64 MBSubmit: 1560 Solved: 949[Submit][Status] ...
- javaScript的几个问题简答
1.javascript的typeof返回哪些数据类型 Object.number. function. boolean. underfind 2.例举3种强制类型转换和2种隐式类型转换? ...
- 【hyddd驱动开发学习】DDK与WDK
最近尝试去了解WINDOWS下的驱动开发,现在总结一下最近看到的资料. 1.首先,先从基础的东西说起,开发WINDOWS下的驱动程序,需要一个专门的开发包,如:开发JAVA程序,我们可能需要一个JDK ...
- 使用FPDF输出中文
① 下载FPDF相关资料=>https://github.com/DCgithub21/cd_FPDF ② 查看目录文件 注:ttf2pt1.zip为字体转换程序 ③ 运行example.ph ...
- win10部署Python3和Python2
首先添加两个的环境变量, 使用python3 -m pip -v (后面为自己的pip命令) 工具: pip install you-get you-get -url
- HDU 1995 汉诺塔V (水题)
题意:.. 析:2^n-i 代码如下: #pragma comment(linker, "/STACK:1024000000,1024000000") #include <c ...
- JAVA基础--JAVA API常见对象(字符串&缓冲区)11
一. String 类型 1. String类引入 第二天学习过Java中的常量: 常量的分类: 数值型常量:整数,小数(浮点数) 字符型常量:使用单引号引用的数据 字符串常量:使用双引号引用 ...
- EOJ3263:丽娃河的狼人传说(贪心)
传送门 题意 分析 考虑将区间按右端点排序,再遍历区间,操作即可 建议以加方式写 trick 1.不需要判区间重合 代码 #include<cstdio> #include<cstr ...
- Codeforces277A 【dfs联通块】
题意: 给出n个人会的语言类型,然后问这n个人里面还需要几个人学习一下语言就可以n个直接互通了.a会1,2,b会2,3,c会4,那么只要C学一下1或者2,或者3就好了...大致就是这个意思. 思路: ...