General mistakes in parallel computing
这是2013年写的一篇旧文,放在gegahost.net上面 http://raison.gegahost.net/?p=97
March 11, 2013
General mistakes in parallel computing
(Original Work by Peixu Zhu)
In parallel computing environment, some general mistakes are frequent and difficult to shoot, caused by random CPU sequence in different thread contexts. Most of them are atomic violation, order violation, and dead lock. Studies show that some famous software also have such mistakes, like MySQL, Apache, Mozilla, and OpenOffice.
1. Atomic violation
In sequent programming, we seldom care the atomic operation, however, in parallel programming, we must remember atomic operations at first. for example:
[Thread 1]
if (_ptr) // A
*_ptr = 0; // B
[Thread 2]
_ptr = NULL; // C
For above code, there’s one statement to be executed in thread 1 and
thread 2 respectively, it seems that it should be running the statement
in thread 1 or thread 2, they should not be interlaced. But, in fact,
statement in thread 1 is not atomic, at least, it can
be divided into step A and B, thus, if it is arranged to execute in
order of A-B-C, it is okay, however, it is also possible be scheduled to
run as A-C-B, this will bring an unexpected memory access error.
We assume that the statement region in thread 1 is atomic, but it is
not true. This is the root of the atomic violation. In many cases, the
problem is caused by code modification, for above example, the statement
in thread 1 may be a simple assignment statement at first:
_ptr = &_val;
And later, the code is modified, and the implicit atomicity is broken.
For systems with multiple cores, the problem will be more
complicated, since each core may cache a block of memory respectively.
For example, core 1 runs thread 1, and core 2 runs thread 2:
[Thread 1]
_ptr = &_val;
[Thread 2]
_ptr = NULL;
Are they atomic ? No, they are not in fact. the `_ptr` may be
optimized to be register value in one core locally, or it is cached in
different core. Thus, the we can not determine the value of `_ptr`.
To avoid atomic violation, we must make the code region atomic, by
locking or atomic operations. Explicit atomic operations on a shared
variable is a good habit, since we are noticed by the statement that it
is atomicity demanded when we try to modify the code.
2. Order violation
Considering below example:
[Thread 1]
_ptr = allocate_memory(); // A
[Thread 2]
_ptr[1] = "right"; // B
If the code is not synchronized, execution order of A-B or B-A are
all possible. In such cases, we must synchronize the code block to
ensure the order of execution.
3. Dead lock
Locking is elemental in concurrent programming. If there’s more than
one threads working with more than with one shared resource, such as
memory block, it is possible that each thread owning a resource is
waiting for each others resource.
[Thread 1]
lock_a.lock();
a = 0; // A
lock_b.lock();
b = 0; // B
lock_b.unlock();
lock_a.unlock();
[Thread 2]
lock_b.lock();
b = 1; // C
lock_a.lock();
a = 1; // D
lock_a.unlock();
lock_b.unlock();
if the code is running as A-B-C-D, there’s no problem, however, if it
is running as A-C-B-D, there’s dead lock. Dead locking requires four
conditions:
a. mutex exclusion
b. hold and wait
c. no preemption
d. circular waiting
Breaking at least one of above four condition will break the dead locking.
General mistakes in parallel computing的更多相关文章
- Introduction to Parallel Computing
Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...
- Method and apparatus for an atomic operation in a parallel computing environment
A method and apparatus for a atomic operation is described. A method comprises receiving a first pro ...
- PatentTips - Safe general purpose virtual machine computing system
BACKGROUND OF THE INVENTION The present invention relates to virtual machine implementations, and in ...
- STROME --realtime & online parallel computing
Data Collections ---> Stream to Channel (as source input) ----> Parallel Computing---> Resu ...
- Parallel Computing–Cannon算法 (MPI 实现)
原理不解释,直接上代码 代码中被注释的源程序可用于打印中间结果,检查运算是否正确. #include "mpi.h" #include <math.h> #includ ...
- Distributed and Parallel Computing
Omega Network Model
- How-to go parallel in R – basics + tips(转)
Today is a good day to start parallelizing your code. I’ve been using the parallel package since its ...
- Parallel Gradient Boosting Decision Trees
本文转载自:链接 Highlights Three different methods for parallel gradient boosting decision trees. My algori ...
- Massively parallel supercomputer
A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures ba ...
随机推荐
- html5--6-47 阶段练习2-渐变按钮
html5--6-47 阶段练习2-渐变按钮 实例 @charset="UTF-8"; .but1{ padding: 10px 20px; font-size:16px; tex ...
- MyEclipse注释配置
MyEclipse注释配置 配置路径 1.1. JAVA 打开MyEclipse,选择Window>Preferences>Java>Code Style>Code ...
- Xcode清楚缓存、清理多余证书路径
Xcode清除缓存.清理多余证书 1.删除Xcode中多余的证书provisioning profile 手动删除: Xcode6 provisioning profile path: ~/Libra ...
- Autolayout UIScrollView
http://www.cocoachina.com/ios/20141011/9871.html Xcode6中如何对scrollview进行自动布局(autolayout) Xcode6中极大的 ...
- Bootstrap-CSS:表格
ylbtech-Bootstrap-CSS:表格 1.返回顶部 1. Bootstrap 表格 Bootstrap 提供了一个清晰的创建表格的布局.下表列出了 Bootstrap 支持的一些表格元素: ...
- 【旧文章搬运】ZwQuerySystemInformation枚举内核模块及简单应用
原文发表于百度空间,2008-10-24========================================================================== 简单说,即 ...
- Codechef SUMCUBE
SUMCUBE code 给定无向简单图 G = (V, E)(即不存在自环和重边),以及 k = 1, 2, 或3 .求$$ \sum_{S \subseteq V} f(S)^k, $$其中 $f ...
- jsp 验证码
<%@page import="java.awt.Graphics2D"%> <%@page import="java.util.Random" ...
- cocos2d-x 坐标系解惑
1.CCTouch* touch->getLocation() ---- 返回当前触摸点在openGL坐标系中的位置 openGL坐标系,原点在左下角,x向右为正,y向上为正. 2.CCTouc ...
- 开源一个基于dotnet standard的轻量级的ORM框架-Light.Data
还在dotnet framework 2.0的时代,当时还没有EF,而NHibernate之类的又太复杂,并且自己也有一些特殊需求,如查询结果直接入表.水平分表和新增数据默认值等,就试着折腾个轻量点O ...