General mistakes in parallel computing
这是2013年写的一篇旧文,放在gegahost.net上面 http://raison.gegahost.net/?p=97
March 11, 2013
General mistakes in parallel computing
(Original Work by Peixu Zhu)
In parallel computing environment, some general mistakes are frequent and difficult to shoot, caused by random CPU sequence in different thread contexts. Most of them are atomic violation, order violation, and dead lock. Studies show that some famous software also have such mistakes, like MySQL, Apache, Mozilla, and OpenOffice.
1. Atomic violation
In sequent programming, we seldom care the atomic operation, however, in parallel programming, we must remember atomic operations at first. for example:
[Thread 1]
if (_ptr) // A
*_ptr = 0; // B
[Thread 2]
_ptr = NULL; // C
For above code, there’s one statement to be executed in thread 1 and
thread 2 respectively, it seems that it should be running the statement
in thread 1 or thread 2, they should not be interlaced. But, in fact,
statement in thread 1 is not atomic, at least, it can
be divided into step A and B, thus, if it is arranged to execute in
order of A-B-C, it is okay, however, it is also possible be scheduled to
run as A-C-B, this will bring an unexpected memory access error.
We assume that the statement region in thread 1 is atomic, but it is
not true. This is the root of the atomic violation. In many cases, the
problem is caused by code modification, for above example, the statement
in thread 1 may be a simple assignment statement at first:
_ptr = &_val;
And later, the code is modified, and the implicit atomicity is broken.
For systems with multiple cores, the problem will be more
complicated, since each core may cache a block of memory respectively.
For example, core 1 runs thread 1, and core 2 runs thread 2:
[Thread 1]
_ptr = &_val;
[Thread 2]
_ptr = NULL;
Are they atomic ? No, they are not in fact. the `_ptr` may be
optimized to be register value in one core locally, or it is cached in
different core. Thus, the we can not determine the value of `_ptr`.
To avoid atomic violation, we must make the code region atomic, by
locking or atomic operations. Explicit atomic operations on a shared
variable is a good habit, since we are noticed by the statement that it
is atomicity demanded when we try to modify the code.
2. Order violation
Considering below example:
[Thread 1]
_ptr = allocate_memory(); // A
[Thread 2]
_ptr[1] = "right"; // B
If the code is not synchronized, execution order of A-B or B-A are
all possible. In such cases, we must synchronize the code block to
ensure the order of execution.
3. Dead lock
Locking is elemental in concurrent programming. If there’s more than
one threads working with more than with one shared resource, such as
memory block, it is possible that each thread owning a resource is
waiting for each others resource.
[Thread 1]
lock_a.lock();
a = 0; // A
lock_b.lock();
b = 0; // B
lock_b.unlock();
lock_a.unlock();
[Thread 2]
lock_b.lock();
b = 1; // C
lock_a.lock();
a = 1; // D
lock_a.unlock();
lock_b.unlock();
if the code is running as A-B-C-D, there’s no problem, however, if it
is running as A-C-B-D, there’s dead lock. Dead locking requires four
conditions:
a. mutex exclusion
b. hold and wait
c. no preemption
d. circular waiting
Breaking at least one of above four condition will break the dead locking.
General mistakes in parallel computing的更多相关文章
- Introduction to Parallel Computing
Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...
- Method and apparatus for an atomic operation in a parallel computing environment
A method and apparatus for a atomic operation is described. A method comprises receiving a first pro ...
- PatentTips - Safe general purpose virtual machine computing system
BACKGROUND OF THE INVENTION The present invention relates to virtual machine implementations, and in ...
- STROME --realtime & online parallel computing
Data Collections ---> Stream to Channel (as source input) ----> Parallel Computing---> Resu ...
- Parallel Computing–Cannon算法 (MPI 实现)
原理不解释,直接上代码 代码中被注释的源程序可用于打印中间结果,检查运算是否正确. #include "mpi.h" #include <math.h> #includ ...
- Distributed and Parallel Computing
Omega Network Model
- How-to go parallel in R – basics + tips(转)
Today is a good day to start parallelizing your code. I’ve been using the parallel package since its ...
- Parallel Gradient Boosting Decision Trees
本文转载自:链接 Highlights Three different methods for parallel gradient boosting decision trees. My algori ...
- Massively parallel supercomputer
A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures ba ...
随机推荐
- CoreData兼容iOS9和iOS10
由于iOS10之后CoreData Stack的更改无法在iOS9的系统中运行,所以我们需要对上一小节中封装的工具类进行系统版本的兼容 iOS9和iOS10中CoreData最本质的区别其实就是管理对 ...
- 从BadBoy导入脚本并调试
一. 利用BadBoy录制自动化脚本,录制事件为禅道中创建bug 在badboy地址栏输入被访问的URL地址 录制成功后截图如下: 录制完成后在badboy窗口中回放确定脚本录制的正确性,回放成功后清 ...
- JAVA GUI THREAD---***
针对用户界面的多线程 GUI下面的多线程方式 1.与GUI类分离方式 分离方式,在创建线程类实例时需要代入GUI句柄,通过GUI句柄操作GUI,也就是说线程类和GUI类都要有对方的实例,以便相互操作. ...
- thiis also a test
EL表达式 1.EL简介 1)语法结构 ${expression} 2)[]与.运算符 EL 提供.和[]两种运算符来存取数据. 当要存取的属性名称中包含一些特殊字符,如.或?等并非字母或数字的符号, ...
- wxPython学习笔记1
wxpython介绍: wxPython 是 Python 语言的一套优秀的 GUI 图形库,允许 Python 程序员很方便的创建完整的.功能键全的 GUI 用户界面. wxPython 是作为优 ...
- ccflow_004请假流程-傻瓜表单-经典模式
ccflow_004请假流程-傻瓜表单-经典模式
- (水题)Codeforces - 630H - Benches
https://codeforces.com/problemset/problem/630/H 又一个组合数学的问题,我们先考虑在 $n$ 列中选出 $5$ 列来放椅子,然后对第一列有 $n$ 种放法 ...
- 875. Koko Eating Bananas
Koko loves to eat bananas. There are N piles of bananas, the i-th pile has piles[i] bananas. The g ...
- hdoj1272 小希的迷宫
并查集 = =.一开始判断连通,没有判断环,后来判断了环,没有判断连通... 还有就是一开始是0 0,也是Yes,有道理么?我不是很懂.. #include <iostream> #inc ...
- P5166 xtq的口令
传送门 这题要是搞懂在干什么其实不难(虽然某个花了几个小时才搞明白的家伙似乎没资格这么说--) 假设所有人都没有听到老师的命令,我们从左到右考虑,对于当前的人,如果它没有观察者,那么肯定要让它听到老师 ...