General mistakes in parallel computing
这是2013年写的一篇旧文,放在gegahost.net上面 http://raison.gegahost.net/?p=97
March 11, 2013
General mistakes in parallel computing
(Original Work by Peixu Zhu)
In parallel computing environment, some general mistakes are frequent and difficult to shoot, caused by random CPU sequence in different thread contexts. Most of them are atomic violation, order violation, and dead lock. Studies show that some famous software also have such mistakes, like MySQL, Apache, Mozilla, and OpenOffice.
1. Atomic violation
In sequent programming, we seldom care the atomic operation, however, in parallel programming, we must remember atomic operations at first. for example:
[Thread 1]
if (_ptr) // A
*_ptr = 0; // B
[Thread 2]
_ptr = NULL; // C
For above code, there’s one statement to be executed in thread 1 and
thread 2 respectively, it seems that it should be running the statement
in thread 1 or thread 2, they should not be interlaced. But, in fact,
statement in thread 1 is not atomic, at least, it can
be divided into step A and B, thus, if it is arranged to execute in
order of A-B-C, it is okay, however, it is also possible be scheduled to
run as A-C-B, this will bring an unexpected memory access error.
We assume that the statement region in thread 1 is atomic, but it is
not true. This is the root of the atomic violation. In many cases, the
problem is caused by code modification, for above example, the statement
in thread 1 may be a simple assignment statement at first:
_ptr = &_val;
And later, the code is modified, and the implicit atomicity is broken.
For systems with multiple cores, the problem will be more
complicated, since each core may cache a block of memory respectively.
For example, core 1 runs thread 1, and core 2 runs thread 2:
[Thread 1]
_ptr = &_val;
[Thread 2]
_ptr = NULL;
Are they atomic ? No, they are not in fact. the `_ptr` may be
optimized to be register value in one core locally, or it is cached in
different core. Thus, the we can not determine the value of `_ptr`.
To avoid atomic violation, we must make the code region atomic, by
locking or atomic operations. Explicit atomic operations on a shared
variable is a good habit, since we are noticed by the statement that it
is atomicity demanded when we try to modify the code.
2. Order violation
Considering below example:
[Thread 1]
_ptr = allocate_memory(); // A
[Thread 2]
_ptr[1] = "right"; // B
If the code is not synchronized, execution order of A-B or B-A are
all possible. In such cases, we must synchronize the code block to
ensure the order of execution.
3. Dead lock
Locking is elemental in concurrent programming. If there’s more than
one threads working with more than with one shared resource, such as
memory block, it is possible that each thread owning a resource is
waiting for each others resource.
[Thread 1]
lock_a.lock();
a = 0; // A
lock_b.lock();
b = 0; // B
lock_b.unlock();
lock_a.unlock();
[Thread 2]
lock_b.lock();
b = 1; // C
lock_a.lock();
a = 1; // D
lock_a.unlock();
lock_b.unlock();
if the code is running as A-B-C-D, there’s no problem, however, if it
is running as A-C-B-D, there’s dead lock. Dead locking requires four
conditions:
a. mutex exclusion
b. hold and wait
c. no preemption
d. circular waiting
Breaking at least one of above four condition will break the dead locking.
General mistakes in parallel computing的更多相关文章
- Introduction to Parallel Computing
Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...
- Method and apparatus for an atomic operation in a parallel computing environment
A method and apparatus for a atomic operation is described. A method comprises receiving a first pro ...
- PatentTips - Safe general purpose virtual machine computing system
BACKGROUND OF THE INVENTION The present invention relates to virtual machine implementations, and in ...
- STROME --realtime & online parallel computing
Data Collections ---> Stream to Channel (as source input) ----> Parallel Computing---> Resu ...
- Parallel Computing–Cannon算法 (MPI 实现)
原理不解释,直接上代码 代码中被注释的源程序可用于打印中间结果,检查运算是否正确. #include "mpi.h" #include <math.h> #includ ...
- Distributed and Parallel Computing
Omega Network Model
- How-to go parallel in R – basics + tips(转)
Today is a good day to start parallelizing your code. I’ve been using the parallel package since its ...
- Parallel Gradient Boosting Decision Trees
本文转载自:链接 Highlights Three different methods for parallel gradient boosting decision trees. My algori ...
- Massively parallel supercomputer
A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures ba ...
随机推荐
- 【Selenium】显示、隐式等待
显示等待 WebDriverWait 超时抛出TimeOutException,默认500毫秒 public class WaitToReturnElement { /* * 设置超时时间为5秒,返回 ...
- codeforces 459E E. Pashmak and Graph(dp+sort)
题目链接: E. Pashmak and Graph time limit per test 1 second memory limit per test 256 megabytes input st ...
- 添加.pch文件
完成框架, .pch文件 1.创建.pch文件 2.配置.pch文件 双击,改成:项目名/pch文件名
- 「UVA524」 Prime Ring Problem 质数环
Description 输入正整数n,把整数1,2,-,n组成一个环,使得相邻两个整数之和均为素数.输出时,从整数1开始逆时针排列.同一个环恰好输出一次.n<=16. A ring is com ...
- C3P0Tool
c3p0-config.xml <c3p0-config> <named-config name="c3p0"> <property name=&qu ...
- [angularJS]ng-hide|ng-show切换
<div class="row ng-scope"> <div class="col-lg-12"> <h1 class=&quo ...
- 虚拟机bridged, NAT and host-only网络区别
In Linux, a network of each type is created when running vmware-config.pl. In Windows, they are auto ...
- RobotFramework:App滑动屏幕
转自:http://blog.csdn.net/jgw2008/article/details/77993399 在使用Robot Framework测试Android机器过程中, 经常要用到滚屏操作 ...
- javascript switch..... case
switch(条件表达式) { case 常量: { 语句a; } break; case 常量: { 语句b; } break; case 常量: { 语句c; } break; ... case ...
- Centos添加jdk环境变量
假设将jdk解压到/opt/jdk1.8.0_131. echo "export JAVA_HOME=/opt/jdk1.8.0_131" >> /etc/profil ...