如题：

无意中发现了一个Python下的mpi教程《A Python Introduction to Parallel Programming with MPI 1.0.2 documentation》

地址如下：

https://materials.jeremybejarano.com/MPIwithPython/#

=================================================================

这里给出自己的一些学习笔记：

Point-to-Point Communication

The Trapezoidal Rule

这个梯形规则就是将一个函数的积分形式用一个近似的采样计算的方法来进行求解。

也就有了原文中的公式：

不过原文中的公式不是很好看懂，还是百度文库的那个文档讲解的比较好：

https://wenku.baidu.com/view/20a29f97dd88d0d233d46a48.html

原文中的代码，修改后：

# trapSerial.py

# example to run: python trapSerial.py 0.0 1.0 10000

import numpy

import sys

import time

# takes in command-line arguments [a,b,n]

a = float(sys.argv[1])

b = float(sys.argv[2])

n = int(sys.argv[3])

def f(x):

    return x * x

def integrateRange(a, b, n):

    '''Numerically integrate with the trapezoid rule on the interval from

        a to b with n trapezoids.

        '''

    integral = -(f(a) + f(b)) / 2.0

    # n+1 endpoints, but n trapazoids

    #for x in numpy.linspace(a, b, n + 1):

    #    integral = integral + f(x)

    integral = integral + numpy.sum( f(numpy.linspace(a, b, n + 1)) )

    integral = integral * (b - a) / n

    return integral

begin_time = time.time()

integral = integrateRange(a, b, n)

end_time = time.time()

print("With n =", n, "trapezoids, our estimate of the integral\

from", a, "to", b, "is", integral)

print("total run time :", end_time - begin_time)

该代码为单机代码，在原始代码基础上改进为向量计算，进一步提高运算的效率。

改进后的mpi代码：

# trapParallel_1.py

# example to run: mpiexec -n 4 python trapParallel_1.py 0.0 1.0 10000

import numpy

import sys

import time

from mpi4py import MPI

from mpi4py.MPI import ANY_SOURCE

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

size = comm.Get_size()

# takes in command-line arguments [a,b,n]

a = float(sys.argv[1])

b = float(sys.argv[2])

n = int(sys.argv[3])

# we arbitrarily define a function to integrate

def f(x):

    return x * x

# this is the serial version of the trapezoidal rule

# parallelization occurs by dividing the range among processes

def integrateRange(a, b, n):

    integral = -(f(a) + f(b)) / 2.0

    # n+1 endpoints, but n trapazoids

    # for x in numpy.linspace(a, b, n + 1):

    #    integral = integral + f(x)

    integral = integral + numpy.sum(f(numpy.linspace(a, b, n + 1)))

    integral = integral * (b - a) / n

    return integral

# local_n is the number of trapezoids each process will calculate

# note that size must divide n

local_n = int(n / size)

# h is the step size. n is the total number of trapezoids

h = (b - a) / (local_n*size)

# we calculate the interval that each process handles

# local_a is the starting point and local_b is the endpoint

local_a = a + rank * local_n * h

local_b = local_a + local_n * h

# initializing variables. mpi4py requires that we pass numpy objects.

recv_buffer = numpy.zeros(size)

if rank == 0:

    begin_time = time.time()

# perform local computation. Each process integrates its own interval

integral = integrateRange(local_a, local_b, local_n)

# communication

# root node receives results from all processes and sums them

if rank == 0:

    recv_buffer[0] = integral

    for i in range(1, size):

        comm.Recv(recv_buffer[i:i+1], ANY_SOURCE)

    total = numpy.sum(recv_buffer)

else:

    # all other process send their result

    comm.Send(integral, dest=0)

# root process prints results

if comm.rank == 0:

    end_time = time.time()

    print("With n =", n, "trapezoids, our estimate of the integral from" \

          , a, "to", b, "is", total)

    print("total run time :", end_time - begin_time)

    print("total size: ", size)

运行命令：

mpiexec -np 4 python trapSerial_1.py 0 1000000 100000000

上面改进的代码本身也实现了原文中所提到的计算负载均衡的问题，不过上面的改进方法是通过修改总的切分个数，从而实现总的切分个数可以被运行进数所整除。

假设我们总共要切分的数量为1099，但是我们要进行计算的进程数量为100，那么每个进程需要分配多少切分数来进行计算呢，下面给出另一种改进方式，在改变总切分数量的前提下使每个进程所负责计算的切分数均为平均。

改进代码：

# trapParallel_2.py

# example to run: mpiexec -n 4 python trapParallel_1.py 0.0 1.0 10000

import numpy

import sys

import time

from mpi4py import MPI

from mpi4py.MPI import ANY_SOURCE

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

size = comm.Get_size()

# takes in command-line arguments [a,b,n]

a = float(sys.argv[1])

b = float(sys.argv[2])

n = int(sys.argv[3])

# we arbitrarily define a function to integrate

def f(x):

    return x * x

# this is the serial version of the trapezoidal rule

# parallelization occurs by dividing the range among processes

def integrateRange(a, b, n):

    integral = -(f(a) + f(b)) / 2.0

    # n+1 endpoints, but n trapazoids

    # for x in numpy.linspace(a, b, n + 1):

    #    integral = integral + f(x)

    integral = integral + numpy.sum(f(numpy.linspace(a, b, n + 1)))

    integral = integral * (b - a) / n

    return integral

# h is the step size. n is the total number of trapezoids

h = (b - a) / n

# local_n is the number of trapezoids each process will calculate

# note that size must divide n

local_n = numpy.zeros(size, dtype=numpy.int32)

local_n[:] = n // size
if n%size!=0:

    local_n[-(n%size):] += 1

# we calculate the interval that each process handles

# local_a is the starting point and local_b is the endpoint

local_a = numpy.sum(local_n[:rank]) * h

local_b = local_a + local_n[rank] * h

# initializing variables. mpi4py requires that we pass numpy objects.

recv_buffer = numpy.zeros(size)

if rank == 0:

    begin_time = time.time()

# perform local computation. Each process integrates its own interval

integral = integrateRange(local_a, local_b, local_n[rank])

# communication

# root node receives results from all processes and sums them

if rank == 0:

    recv_buffer[0] = integral

    for i in range(1, size):

        comm.Recv(recv_buffer[i:i+1], ANY_SOURCE)

    total = numpy.sum(recv_buffer)

else:

    # all other process send their result

    comm.Send(integral, dest=0)

# root process prints results

if comm.rank == 0:

    end_time = time.time()

    print("With n =", n, "trapezoids, our estimate of the integral from" \

          , a, "to", b, "is", total)

    print("total run time :", end_time - begin_time)

    print("total size: ", size)

计算负载均衡的核心代码为：

# h is the step size. n is the total number of trapezoids

h = (b - a) / n

# local_n is the number of trapezoids each process will calculate

# note that size must divide n

local_n = numpy.zeros(size, dtype=numpy.int32)

local_n[:] = n // size
if n%size!=0:

    local_n[-(n%size):] += 1

# we calculate the interval that each process handles

# local_a is the starting point and local_b is the endpoint

local_a = numpy.sum(local_n[:rank]) * h

local_b = local_a + local_n[rank] * h

运行命令：

mpiexec --oversubscribe -np 100 python trapSerial_2.py 0 1000000 1099

最后的改进方法更好的实现了计算的负载均衡。

==================================================

上面的改进方法对应集体通信的话又该如何改进呢？？？

Collective Communication

The Parallel Trapezoidal Rule 2.0

改进方法1对应的 trapParallel_1.py 改进：

# trapParallel_1.py

# example to run: mpiexec -n 4 python26 trapParallel_2.py 0.0 1.0 10000

import numpy

import sys

import time

from mpi4py import MPI

from mpi4py.MPI import ANY_SOURCE

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

size = comm.Get_size()

# takes in command-line arguments [a,b,n]

a = float(sys.argv[1])

b = float(sys.argv[2])

n = int(sys.argv[3])

# we arbitrarily define a function to integrate

def f(x):

    return x * x

# this is the serial version of the trapezoidal rule

# parallelization occurs by dividing the range among processes

def integrateRange(a, b, n):

    integral = -(f(a) + f(b)) / 2.0

    # n+1 endpoints, but n trapazoids

    #for x in numpy.linspace(a, b, n + 1):

    #    integral = integral + f(x)

    integral = integral + numpy.sum(f(numpy.linspace(a, b, n + 1)))

    integral = integral * (b - a) / n

    return integral

# local_n is the number of trapezoids each process will calculate

# note that size must divide n

local_n = int(n / size)

# h is the step size. n is the total number of trapezoids

h = (b - a) / (local_n*size)

# we calculate the interval that each process handles

# local_a is the starting point and local_b is the endpoint

local_a = a + rank * local_n * h

local_b = local_a + local_n * h

# initializing variables. mpi4py requires that we pass numpy objects.

#integral = numpy.zeros(1)

total = numpy.zeros(1)

if rank == 0:

    begin_time = time.time()

# perform local computation. Each process integrates its own interval

integral = integrateRange(local_a, local_b, local_n)

# communication

# root node receives results with a collective "reduce"

comm.Reduce(integral, total, op=MPI.SUM, root=0)

# root process prints results

if comm.rank == 0:

    end_time = time.time()

    print("With n =", n, "trapezoids, our estimate of the integral from" \

          , a, "to", b, "is", total)

    print("total run time :", end_time - begin_time)

    print("total size: ", size)

运行命令：

mpiexec --oversubscribe -np 100 python trapSerial_1.py 0 1000000 1099

改进方法2 对应的 trapParallel_2.py 改进：

# trapParallel_2.py

# example to run: mpiexec -n 4 python26 trapParallel_2.py 0.0 1.0 10000

import numpy

import sys

import time

from mpi4py import MPI

from mpi4py.MPI import ANY_SOURCE

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

size = comm.Get_size()

# takes in command-line arguments [a,b,n]

a = float(sys.argv[1])

b = float(sys.argv[2])

n = int(sys.argv[3])

# we arbitrarily define a function to integrate

def f(x):

    return x * x

# this is the serial version of the trapezoidal rule

# parallelization occurs by dividing the range among processes

def integrateRange(a, b, n):

    integral = -(f(a) + f(b)) / 2.0

    # n+1 endpoints, but n trapazoids

    #for x in numpy.linspace(a, b, n + 1):

    #    integral = integral + f(x)

    integral = integral + numpy.sum(f(numpy.linspace(a, b, n + 1)))

    integral = integral * (b - a) / n

    return integral

# h is the step size. n is the total number of trapezoids

h = (b - a) / n

# local_n is the number of trapezoids each process will calculate

# note that size must divide n

local_n = numpy.zeros(size, dtype=numpy.int32)

local_n[:] = n // size
if n%size!=0:

    local_n[-(n%size):] += 1

# we calculate the interval that each process handles

# local_a is the starting point and local_b is the endpoint

local_a = numpy.sum(local_n[:rank]) * h

local_b = local_a + local_n[rank] * h

# initializing variables. mpi4py requires that we pass numpy objects.

#integral = numpy.zeros(1)

total = numpy.zeros(1)

if rank == 0:

    begin_time = time.time()

# perform local computation. Each process integrates its own interval

integral = integrateRange(local_a, local_b, local_n[rank])

# communication

# root node receives results with a collective "reduce"

comm.Reduce(integral, total, op=MPI.SUM, root=0)

# root process prints results

if comm.rank == 0:

    end_time = time.time()

    print("With n =", n, "trapezoids, our estimate of the integral from" \

          , a, "to", b, "is", total)

    print("total run time :", end_time - begin_time)

    print("total size: ", size)

运行命令：

mpiexec --oversubscribe -np 100 python trapSerial_2.py 0 1000000 1099

====================================================

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation的更多相关文章

Python学习入门基础教程(learning Python)--5.6 Python读文件操作高级
前文5.2节和5.4节分别就Python下读文件操作做了基础性讲述和提升性介绍,但是仍有些问题,比如在5.4节里涉及到一个多次读文件的问题,实际上我们还没有完全阐述完毕,下面这个图片的问题在哪呢? 问 ...
Python学习入门基础教程(learning Python)--5.1 Python下文件处理基本过程
Python下的文件读写操作过程和其他高级语言如C语言的操作过程基本一致,都要经历以下几个基本过程. 1. 打开文件首先是要打开文件,打开文件的主要目的是为了建立程序和文件之间的联系.按程序访问文件 ...
Python学习入门基础教程(learning Python)--5.2 Python读文件基础
上节简单的说明了一下Pyhon下的文件读写基本流程,从本节开始,我们做几个小例子来具体展示一下Python下的文件操作,本节主要是详细讲述Python的文件读操作. 下面举一个例子,例子的功能是读取当 ...
Python学习入门基础教程(learning Python)--5 Python文件处理
本节主要讨论Python下的文件操作技术. 首先,要明白为何要学习或者说关系文件操作这件事?其实道理很简单,Python程序运行时,数据是存放在RAM里的,当Python程序运行结束后数据从RAM被清 ...
Python学习入门基础教程(learning Python)--5.3 Python写文件基础
前边我们学习了一下Python下如何读取一个文件的基本操作,学会了read和readline两个函数,本节我们学习一下Python下写文件的基本操作方法. 这里仍然是举例来说明如何写文件.例子的功能是 ...
Python学习入门基础教程(learning Python)--3.1Python的if分支语句
本节研究一下if分支语句. if分支语句是Python下逻辑条件控制语句,用于条件执行某些语句的控制操作,当if后的条件conditon满足时,if其下的语句块被执行,但当if的控制条件condito ...
Python学习入门基础教程(learning Python)--6.3 Python的list切片高级
上节"6.2 Python的list访问索引和切片"主要学习了Python下的List的访问技术:索引和切片的基础知识,这节将就List的索引index和切片Slice知识点做进一 ...
Python学习入门基础教程(learning Python)--6.4 Python的list与函数
list是python下的一种数据类型,他和其他类型如整形.浮点型.字符串等数据类型一样也可作为函数的型参和实参来使用! 1.list作为参数 list数据类型可以作为函数的参数传递给函数取做相应的处 ...
Python学习入门基础教程(learning Python)--5.7 Python文件数据记录存储与处理
本节主要讨论Python下如何通过文件操作实现对数据记录集的存储与处理的操作方法. 在Python里和其他高级语言一样可以通过文件读写将一些记录集写入文件或者通过文件读操作从文件里读取一条或多条和数据 ...
Python学习入门基础教程(learning Python)--2.3.3Python函数型参详解
本节讨论Python下函数型参的预设值问题. Python在设计函数时,可以给型参预设缺省值,当用户调用函数时可以不输入实参.如果用户不想使用缺省预设值则需要给型参一一赋值,可以给某些型参赋值或不按型 ...

随机推荐

IDEA:java: Compilation failed: internal java compiler error
java: Compilation failed: internal java compiler error 解决方法: 1.打开菜单 ,File - Project Structure - Proj ...
使用 OpenTelemetry 构建可观测性 04 - 收集器
在之前的博文中,我们讨论了如何使用 SDK 和链路追踪生产者来导出进程中的遥测数据.尽管有多种类型的导出器可供选择,但其中一个常见的目标是将数据导出到 OpenTelemetry Collector. ...
EBLK日志收集方案
ELK 是elastic公司提供的一套完整的日志收集以及展示的解决方案,是三个产品的首字母缩写,分别是ElasticSearch.Logstash 和 Kibana.该组合版本会统一发布. Elast ...
.NET8操作Mysql，Pomelo.EntityFrameworkCore.MySql版本目前最高只有7.0解决办法
Pomelo.EntityFrameworkCore.MySql7.0是不支持.NET8的,但现在Pomelo.EntityFrameworkCore.MySql的最新版本只有7.0 因为 Pomel ...
58同城的登录（RSA算法）
Tips:当你看到这个提示的时候,说明当前的文章是由原emlog博客系统搬迁至此的,文章发布时间已过于久远,编排和内容不一定完整,还请谅解` 58同城的登录(RSA算法) 日期:2016-11-23 ...
spring数据验证
一般情况下,我们并不推荐在服务端做基础的数据校验,因为这有一个很主要的问题:它加重了服务器的负载,如果并发多,这种负载就更加明显. 如果我们跟踪一个简单的Controller方法执行过程,就会发现Sp ...
Redis八股文（大厂面试真题）
号:tutou123com我是小宋编码,Java程序员 ,只熬夜但不秃头. 关注我,带你轻松过面试.提升简历亮点如果你觉得对你有帮助,欢迎关注[1] 内容目录 1.说说redis,了解redis源码 ...
你了解Vim的增删改查吗？
增: 在Vim的Normal模式中输入A/I/O,a/i/o字符进行对应的增加操作. 删在Vim的Normal模式中, 输入x 删除光标对应的一个字符(4x代表删除4个字符): 输入dd删除光标所在 ...
Linux 内核：设备驱动模型（4）uevent与热插拔
Linux 内核:设备驱动模型(4)uevent与热插拔背景我们简单回顾一下Linux的设备驱动模型(Linux Device Driver Model,LDDM): 1.在<sysfs与k ...
VUE手稿1

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation

Point-to-Point Communication

The Trapezoidal Rule

Collective Communication

The Parallel Trapezoidal Rule 2.0

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation的更多相关文章

随机推荐

热门专题