OpenMP for Fortran


  • OpenMP Directive
  • Syntax of OpenMP compiler directive for Fortran:
     !$OMP  DirectiveName Optional_CLAUSES...
    ...
    ... Program statements between the !$OMP lines
    ... are executed in parallel by all threads
    ...
    !$OMP END DirectiveName
  • Program statements between the 2 red lines are executed by multiple threads


  • Setting the level of parallellism in OpenMP programs
  • The number of threads that will be created to execute parallel sections in an OpenMP program is controlled by the environment variable OMP_NUM_THREADS
  • To set this environment variable use:
      export OMP_NUM_THREADS=...            
    
    Example:
    
      export OMP_NUM_THREADS=8
    


  • Compiling OpenMP programs
    • Fortran

      • Compile:

          f90 -O -c -xopenmp -stackvar Prog.f90
        
      • Link:
          f90 -O -o Executable \
        -xopenmp -stackvar \
        Prog1.o Prog2.o ....


  • Introductory Example
    • Parallel "Hello World" OpenMP program:

         PROGRAM  Main
      
         !$OMP PARALLEL
      
         print *, "Hello World !"                 
      
         !$OMP END PARALLEL
      
         END
      

    • Example Program: (Demo above code)                                                
    • Compile with:
          f90 -O

      -xopenmp -stackvar

        openMP01.f90
    • Run with:
      • export OMP_NUM_THREADS=8
      • a.out

      Make sure you do it on compute.

      You will see "Hello World !!!" printed EIGHT times !!! (Remove the #pragma line and you get ONE line)....



  • Defining shared and private (non-shared) variables in parallel section
  • Recall:
    • There is no scopes in Fortran

    Fortran uses option keywords to define private (non-shared) (and shared) variables....


  • Defining shared and private variables in a PARALLEL section
    • A variable is by default shared among all threads
    • A private variable in a PARALLE section must be specified using the option PRIVATE

  • Fortran example of SHARED variable:
       PROGRAM  Main
    IMPLICIT NONE integer :: N ! Shared N = 1001
    print *, "Before parallel section: N = ", N !$OMP PARALLEL
    N = N + 1
    print *, "Inside parallel section: N = ", N
    !$OMP END PARALLEL print *, "After parallel section: N = ", N
    END

  • Example Program: (Demo above code)                        
    • Prog file: (Shared variable in OpenMP) --- click here
  • Compile with:
        f90 -O

    -xopenmp -stackvar

      openMP02a.f90
  • Run a few times with:
    • export OMP_NUM_THREADS=8
    • a.out

    You should see the value for N at the end is not always 1009, it could be less. This is evidence of asynchronous update.



  • Fortran example of NON-SHARED (private) variable:
       PROGRAM  Main
    IMPLICIT NONE integer :: N ! Shared N = 1001
    print *, "Before parallel section: N = ", N !$OMP PARALLEL PRIVATE(N)
    N = N + 1
    print *, "Inside parallel section: N = ", N
    !$OMP END PARALLEL print *, "After parallel section: N = ", N
    END

  • Example Program: (Demo above code)                        
    • Prog file: (Private variable in OpenMP) --- click here
  • Compile with:
        f90 -O

    -xopenmp -stackvar

      openMP02b.f90
  • Run a few times with:
    • export OMP_NUM_THREADS=8
    • a.out

  • Output:
        Before parallel section: N =  1001
    Inside parallel section: N = 1
    Inside parallel section: N = 1
    Inside parallel section: N = 1
    Inside parallel section: N = 1
    Inside parallel section: N = 1
    Inside parallel section: N = 1
    Inside parallel section: N = 1
    Inside parallel section: N = 1
    After parallel section: N = 1001

    Each thread has its own variable N

    This variable N is different from the "program" variable defined in the main program !!!



  • OpenMP Support function
  • Most useful support functions in OpenMP:
    Function Name Effect
    omp_set_num_threads(int nthread) Set size of thread team
    INTEGER omp_get_num_threads() return size of thread team
    INTEGER omp_get_max_threads() return max size of thread team (typically equal to the number of processors
    INTEGER omp_get_thread_num() return thread ID of the thread that calls this function
    INTEGER omp_get_num_procs() return number of processors
    LOGICAL omp_in_parallel() return TRUE if currently in a PARALLEL segment
  • Here is a simple OMP program in Fortran:
       PROGRAM  Main
    IMPLICIT NONE INTEGER :: nthreads, myid
    INTEGER, EXTERNAL :: OMP_GET_THREAD_NUM, OMP_GET_NUM_THREADS !$OMP PARALLEL private(nthreads, myid) myid = OMP_GET_THREAD_NUM() print *, "Hello I am thread ", myid if (myid == 0) then
    nthreads = OMP_GET_NUM_THREADS()
    print *, "Number of threads = ", nthreads
    end if !$OMP END PARALLEL END
  • Example Program: (OpenMP Fortran program) --- click here       
  • Compile using the following command:
        f90 -O

    -xopenmp -stackvar

      hello.f90
  • Run with:
    • export OMP_NUM_THREADS=8
    • a.out

  • Output:
      Hello I am thread  7
    Hello I am thread 5
    Hello I am thread 1
    Hello I am thread 0
    Hello I am thread 2
    Number of threads = 8
    Hello I am thread 4
    Hello I am thread 3
    Hello I am thread 6


  • Caveat with Fortran
    • Recall:

      • Array indices in Fortran by default start with 1 (ONE)
    • Observed from "Hello" program:
      • Thread IDs start with 0 (ZERO)
    • Caveat:
      • Use ThreadID+1 as index to an array in Fortran !!!


  • Example OpenMP Program: Find minimum in an array
    • A sequential program in C++ can be found here: (click here)
    • We will write this program using OpenMP in Fortran

    • Parallel Find Min program in Fortran:
        PROGRAM Min
      IMPLICIT NONE INTEGER, PARAMETER :: MAX = 10000000 DOUBLE PRECISION, DIMENSION(MAX) :: x
      DOUBLE PRECISION, DIMENSION(10) :: my_min
      DOUBLE PRECISION :: rmin INTEGER :: num_threads
      INTEGER :: i, n
      INTEGER :: id, start, stop ! ===========================================================
      ! Declare the OpenMP functions
      ! ===========================================================
      INTEGER, EXTERNAL :: OMP_GET_THREAD_NUM, OMP_GET_NUM_THREADS ! ===================================
      ! Parallel section: Find local minima
      ! ===================================
      !$OMP PARALLEL PRIVATE(i, id, start, stop, num_threads, n) num_threads = omp_get_num_threads()
      n = MAX/num_threads id = omp_get_thread_num() ! ----------------------------------
      ! Find my own starting index
      ! ----------------------------------
      start = id * n + 1 !! Array start at 1 ! ----------------------------------
      ! Find my own stopping index
      ! ----------------------------------
      if ( id <> (num_threads-1) ) then
      stop = start + n
      else
      stop = MAX
      end if ! ----------------------------------
      ! Find my own min
      ! ----------------------------------
      my_min(id+1) = x(start) DO i = start+1, stop
      IF ( x(i) < my_min(id+1) ) THEN
      my_min(id+1) = x(i)
      END IF
      END DO !$OMP END PARALLEL ! ===================================
      ! Find min over the local minima
      ! ===================================
      rmin = my_min() DO i = 2, num_threads
      IF ( rmin < my_min(i) ) THEN
      rmin = my_min(i)
      END IF
      END DO print *, "min = ", rmin
      END PROGRAM
    • Example Program: (Demo above code)                                                
          f90 -O

      -xopenmp -stackvar

        min-mt1.f90
    • Run with:
      • export OMP_NUM_THREADS=8
      • a.out


  • Mutual exclusion synchronization Primitives
  • This mutual exclusion effect in Fortran is achieved in OpenMP using the following pragma:
       !$OMP CRITICAL
    
           ... statements are guaranteed to be executed
    ,,, by ONE thread at any one time !$OMP END CRITICAL


  • Example OpenMP program with synchronization: compute Pi
  • Example:
      PROGRAM Compute_PI
    IMPLICIT NONE INTEGER, EXTERNAL :: OMP_GET_THREAD_NUM, OMP_GET_NUM_THREADS INTEGER N, i
    INTEGER id, num_threads
    DOUBLE PRECISION w, x, sum
    DOUBLE PRECISION pi, mypi N = 50000000 !! Number of intervals
    w = 1.0d0/N !! width of each interval sum = 0.0d0 !$OMP PARALLEL PRIVATE(i, id, num_threads, x, mypi) num_threads = omp_get_num_threads()
    id = omp_get_thread_num() mypi = 0.0d0; DO i = id, N-1, num_threads
    x = w * (i + 0.5d0)
    mypi = mypi + w*f(x)
    END DO !$OMP CRITICAL
    pi = pi + mypi
    !$OMP END CRITICAL !$OMP END PARALLEL PRINT *, "Pi = ", pi END PROGRAM
  • Example Program: (OpenMP compute Pi) --- click here       
  • Compile with:
        f90 -O

    -xopenmp -stackvar

      openMP_compute_pi2.f90
  • Run a few times with:
    • export OMP_NUM_THREADS=8
    • a.out



  • Parallel For Loop in OpenMP

    The division of labor (splitting the work of a for-loop) of a for-loop can be done in OpenMP through a special Parallel LOOP construct.

  • A Parallel Loop construct MUST appear within a Parallel region of the program !
  • The syntax of a Parallel LOOP construct in Fortran is:
       !$OMP    DO
    
          DO  index = ....
    .... ! Division of labor is taken care of
    ! by the Fortran compiler
    END DO !$OMP END DO
  • The meaning of this Parallel LOOP construct is to distribute the iterations in the for-loop (or do-loop) among the threads.

    Each iteration of the for-loop is executed exactly once by each thread.

    The loop variable used in the Parallel LOOP construct is by default PRIVATE (other variables are still by default SHARED)


  • Example: compute Pi with parallel DO loop
      PROGRAM Compute_PI
    IMPLICIT NONE INTEGER N, i, num_threads
    DOUBLE PRECISION w, x, sum
    DOUBLE PRECISION pi, mypi N = 50000000 !! Number of intervals
    w = 1.0d0/N !! width of each interval sum = 0.0d0 !$OMP PARALLEL PRIVATE(x, mypi) mypi = 0.0d0; !$OMP DO
    DO i = 0, N-1 !! Parallel Loop
    x = w * (i + 0.5d0)
    mypi = mypi + w*f(x)
    END DO
    !$OMP END DO !$OMP CRITICAL
    pi = pi + mypi
    !$OMP END CRITICAL !$OMP END PARALLEL PRINT *, "Pi = ", pi END PROGRAM
  • Example Program: (OpenMP compute Pi) --- click here       

  • Compile with:
        f90 -O

    -xopenmp -stackvar

      openMP_compute_pi3.f90
  • Run with:
    • export OMP_NUM_THREADS=8
    • a.out


  • Final Notes
  • The stack size of each thread can be controlled by setting another environment variable:
      setenv   STACKSIZE    nBytes
    
  • For more information on OpenMP, see: http://www.openmp.org





OpenMP for Fortran的更多相关文章

  1. 在fortran下进行openmp并行计算编程

    最近写水动力的程序,体系太大,必须用并行才能算的动,无奈只好找了并行编程的资料学习了.我想我没有必要在博客里开一个什么并行编程的教程之类,因为网上到处都是,我就随手记点重要的笔记吧.这里主要是open ...

  2. Fortran+ OpenMP实现实例

    PROGRAM parallel_01 USE omp_lib IMPLICIT NONE INTEGER :: i,j INTEGER() :: time_begin, time_end, time ...

  3. OpenMP并行构造的schedule子句详解 (转载)

    原文:http://blog.csdn.net/gengshenghong/article/details/7000979 schedule的语法为: schedule(kind, [chunk_si ...

  4. openMP的一点使用经验【非原创】

    按照百科上说的,针对于openmp的编程,最简单的就是在开头加个#include<omp.h>,然后在后面的for上加一行#pragma omp parallel for即可,下面的是较为 ...

  5. Call Paralution Solver from Fortran

    Abstract: Paralution is an open source library for sparse iterative methods with special focus on mu ...

  6. OpenMP初步(英文)

    Beginning OpenMP OpenMP provides a straight-forward interface to write software that can use multipl ...

  7. Fortran并行计算的一些例子

    以下例子来自https://computing.llnl.gov/tutorials/openMP/exercise.html网站 一.打印线程(Hello world) C************* ...

  8. OpenMP并行编程

    什么是OpenMP?“OpenMP (Open Multi-Processing) is an application programming interface (API) that support ...

  9. 学习OpenCV——OpenMP

    转自:http://www.cnblogs.com/yangyangcv/archive/2012/03/23/2413335.html openMP的一点使用经验   最近在看多核编程.简单来说,由 ...

随机推荐

  1. Action类为何要 extends ActionSupport

    我做的时候,我的action是继承ActionSupport的 Struts 2的Action无须实现任何接口或继承任何类型,普通的POJO类就可以用做Action类,但是,我们为了方便实现Actio ...

  2. Android图形基础

    Android图形基础 Android在其android.graphics包中提供了完整的本机二维图像库. Color类,代表颜色,是用4个数字表示的,透明度.红色.绿色和蓝色(Alpha.Red.G ...

  3. 【POI xlsx】使用POI对xlsx的单元格样式进行设置 / 使用POI对xlsx的字体进行设置

    涉及到的样式都在代码中有说明: package com.it.poiTest; import java.io.FileNotFoundException; import java.io.FileOut ...

  4. [LintCode] Trapping rain water II

    Given n x m non-negative integers representing an elevation map 2d where the area of each cell is 1  ...

  5. 什么是 IntentService

    service 默认也运行在 UI 线程,所以里面不能直接做耗时操作,要做耗时操作还得开启子线程来做. IntentService 就是一个 Service, 只不过里面给你默认开启了一个子线程来处理 ...

  6. BZOJ3745 : [Coci2014]Norma

    考虑枚举右端点,用线段树维护[i,nowr]的答案. 当右端点向右延伸时,需要知道它前面第一个比它大/小的数的位置,这里面的最值将发生改变,这个使用单调队列求出,然后将所有的l都加1. 注意常数优化. ...

  7. 【wikioi】1904 最小路径覆盖问题(最大流+坑人的题+最小路径覆盖)

    http://wikioi.com/problem/1904/ 这题没看数据的话是一个大坑(我已报告官方修复了),答案只要求数量,不用打印路径...orz 最小路径覆盖=n-最大匹配,这个我在说二分图 ...

  8. HDU 4648 Magic Pen 6

    题目链接 6Y什么水平.. #include <cstdio> #include <cstring> #include <string> #include < ...

  9. BestCoder Round #72

    由于第一次打,只能在div2打.(这么好的机会还没AK真是丢人) T1 Clarke and chemistry 枚举题不解释(我不会告诉你我上来WA了四发的) T2 Clarke and point ...

  10. HTML<marquee>标签

    <marquee>标签,它是成对出现的标签,首标签<marquee>和尾标签</marquee>之间的内容就是滚动内容.<marquee>标签的属性主要 ...