接前文：

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation

之

https://materials.jeremybejarano.com/MPIwithPython/collectiveCom.html

Collective Communication

Reduce(…) and Allreduce(…)

例子：

Reduce

import numpy

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

size = comm.Get_size()

rankF = numpy.array(float(rank))

if rank == 0:

    total = numpy.zeros(1)

else:

    total = None

comm.Reduce(rankF, total, op=MPI.MAX)

#comm.Reduce(rankF, total, op=MPI.SUM)

if rank == 0:

    print("total: ", total)

Allreduce

import numpy

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

size = comm.Get_size()

rankF = numpy.array(float(rank))

total = numpy.zeros(1)

comm.Allreduce(rankF, total, op=MPI.MAX)

print("rank {}  :  total {} ".format(rank, total))

Scatter

# dotProductParallel_1.py

# "to run" syntax example: mpiexec -n 4 python26 dotProductParallel_1.py 40000

from mpi4py import MPI

import numpy

import sys

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

size = comm.Get_size()

# read from command line

# n = int(sys.argv[1])    #length of vectors

n = 10000

# arbitrary example vectors, generated to be evenly divided by the number of

# processes for convenience

x = numpy.linspace(0, 100, n) if comm.rank == 0 else None

y = numpy.linspace(20, 300, n) if comm.rank == 0 else None

# initialize as numpy arrays

dot = numpy.array([0.])

local_n = numpy.array([0], dtype=numpy.int32)

# test for conformability

if rank == 0:

    if n != y.size:

        print("vector length mismatch")

        comm.Abort()

    # currently, our program cannot handle sizes that are not evenly divided by

    # the number of processors

    if n % size != 0:

        print("the number of processors must evenly divide n.")

        comm.Abort()

    # length of each process's portion of the original vector

    local_n = numpy.array([n / size], dtype=numpy.int32)

# communicate local array size to all processes

comm.Bcast(local_n, root=0)

# initialize as numpy arrays

local_x = numpy.zeros(local_n)

local_y = numpy.zeros(local_n)

# divide up vectors

comm.Scatter(x, local_x, root=0)

comm.Scatter(y, local_y, root=0)

# local computation of dot product

local_dot = numpy.array([numpy.dot(local_x, local_y)])

# sum the results of each

#comm.Reduce(local_dot, dot, op=MPI.SUM)

comm.Allreduce(local_dot, dot, op=MPI.SUM)

print("The dot product is", dot[0], "computed in parallel")

if rank == 0:

    #print("The dot product is", dot[0], "computed in parallel")

    print("and", numpy.dot(x, y), "computed serially")

Scatterv(…) and Gatherv(…)

# for correct performance, run unbuffered with 3 processes:

# mpiexec -n 3 python26 scratch.py -u

import numpy

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

if rank == 0:

    x_global = numpy.linspace(0,100,11)

else:

    x_global = None

if rank == 0:

    x_local = numpy.zeros(1)

elif rank == 1:

    x_local = numpy.zeros(1)

elif rank == 2:

    x_local = numpy.zeros(9)

if rank == 0:

    print("Scatter")

comm.Scatterv([x_global, (1,1,9), (0,1,2), MPI.DOUBLE], x_local)

print("process " + str(rank) + " has " + str(x_local))

comm.Barrier()

if rank == 0:

    print("Gather")

    xGathered = numpy.zeros(11)

else:

    xGathered = None

comm.Gatherv(x_local, [xGathered, (1,1,9), (0,1,2), MPI.DOUBLE])

print("process " + str(rank) + " has " +str(xGathered))

该代码运行命令为：

mpiexec -np 3 python x.py

上个代码有个地方容易被忽视那就是函数 comm.Scatterv 其实是非堵塞的，也就是说如果rank==0进程在执行该语句后不进行同步操作：comm.Barrier

那么rank==0进程会继续向下执行而不会等待rank==1，rank==2进程完全接收数据到各自的变量 x_local 中。

给出修改的代码：

# for correct performance, run unbuffered with 3 processes:

# mpiexec -n 3 python26 scratch.py -u

import numpy

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

if rank == 0:

    x_global = numpy.linspace(0,100,11)

else:

    x_global = None

if rank == 0:

    x_local = numpy.zeros(1)

elif rank == 1:

    x_local = numpy.zeros(1)

elif rank == 2:

    x_local = numpy.zeros(9)

if rank == 0:

    print("Scatter")

if rank != 0:

    import time

    time.sleep(10)

comm.Scatterv([x_global, (1,1,9), (0,1,2), MPI.DOUBLE], x_local)

print("process " + str(rank) + " has " + str(x_local))

#comm.Barrier()

if rank == 0:

    print("Gather")

    xGathered = numpy.zeros(11)

else:

    xGathered = None

comm.Gatherv(x_local, [xGathered, (1,1,9), (0,1,2), MPI.DOUBLE])

print("process " + str(rank) + " has " +str(xGathered))

该代码执行后会先打印结果：

Scatter

process 0 has [0.]

Gather

然后进入堵塞大致10秒时间，由此可以看到不进行 comm.Barrier 操作的 comm.Scatterv 是非堵塞的，rank==0没有等待其他进程完全接收数据便向下执行了，但是从运行结果上我们可以看到收集操作 comm.Gatherv 是堵塞的，也正因此rank==0进程会在此处进入堵塞10秒的状态。

由于上面的代码后续运行中有堵塞操作了，因此没有 comm.Barrier 操作也不会有问题，不过对于MPI中的非堵塞操作还是进行同步操作 comm.Barrier 操作以防万一的安全一些。

该代码运行命令同样也为：

mpiexec -np 3 python x.py

==================================================

上面的代码其实是把进程数量硬编码进入代码里面了，如果运行不是 -np 3 而是其他数值则会报错，而这种编码方式是不妥的，因此不把进程数硬编码进去同时也能实现很好的计算负载是需要的。给出自己的工作：

实现计算负载均衡的代码：

# for correct performance, run unbuffered with 3 processes:

# mpiexec -n 3 python26 scratch.py -u

import numpy

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

size = comm.Get_size()

n = 10000

if rank == 0:

    x_global = numpy.linspace(0,100,n)

else:

    x_global = None

n_local = numpy.zeros(size, dtype=numpy.int32)

n_local[:] = n // size

if n%size != 0:

    n_local[-(n%size):] += 1

begin_local = numpy.zeros(size)

for i in range(1, size):

    begin_local[i] = begin_local[i-1] + n_local[i-1]

x_local = numpy.zeros(n_local[rank])

#if rank != 0:

#    import time

#    time.sleep(5)

if rank == 0:

    print("Scatter")

comm.Scatterv([x_global, n_local, begin_local, MPI.DOUBLE], x_local)

print("process " + str(rank) + " has " + str(x_local[:5]))

comm.Barrier()

if rank == 0:

    print("Gather")

    xGathered = numpy.zeros(n)

else:

    xGathered = None

comm.Gatherv(x_local, [xGathered, n_local, begin_local, MPI.DOUBLE])

print("process " + str(rank) + " has " +str(xGathered))

运行命令：

mpiexec -np 23 python x.py

可以看到改进后的代码没有把总共运行的进程数硬编码到代码中，而是可以根据实际需要对任意数值下的总进程数实现计算负载均衡。改进代码根据总的运行进程数将计算服务均衡的划分给所有计算进程。

================================================

以上代码运行命令如无特殊说明则为：

mpiexec -np 8 python x.py

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation （续 #2 ）的更多相关文章

Python学习入门基础教程(learning Python)--5.6 Python读文件操作高级
前文5.2节和5.4节分别就Python下读文件操作做了基础性讲述和提升性介绍,但是仍有些问题,比如在5.4节里涉及到一个多次读文件的问题,实际上我们还没有完全阐述完毕,下面这个图片的问题在哪呢? 问 ...
Python学习入门基础教程(learning Python)--5.1 Python下文件处理基本过程
Python下的文件读写操作过程和其他高级语言如C语言的操作过程基本一致,都要经历以下几个基本过程. 1. 打开文件首先是要打开文件,打开文件的主要目的是为了建立程序和文件之间的联系.按程序访问文件 ...
Python学习入门基础教程(learning Python)--5.2 Python读文件基础
上节简单的说明了一下Pyhon下的文件读写基本流程,从本节开始,我们做几个小例子来具体展示一下Python下的文件操作,本节主要是详细讲述Python的文件读操作. 下面举一个例子,例子的功能是读取当 ...
Python学习入门基础教程(learning Python)--5 Python文件处理
本节主要讨论Python下的文件操作技术. 首先,要明白为何要学习或者说关系文件操作这件事?其实道理很简单,Python程序运行时,数据是存放在RAM里的,当Python程序运行结束后数据从RAM被清 ...
Python学习入门基础教程(learning Python)--5.3 Python写文件基础
前边我们学习了一下Python下如何读取一个文件的基本操作,学会了read和readline两个函数,本节我们学习一下Python下写文件的基本操作方法. 这里仍然是举例来说明如何写文件.例子的功能是 ...
Python学习入门基础教程(learning Python)--3.1Python的if分支语句
本节研究一下if分支语句. if分支语句是Python下逻辑条件控制语句,用于条件执行某些语句的控制操作,当if后的条件conditon满足时,if其下的语句块被执行,但当if的控制条件condito ...
Python学习入门基础教程(learning Python)--6.3 Python的list切片高级
上节"6.2 Python的list访问索引和切片"主要学习了Python下的List的访问技术:索引和切片的基础知识,这节将就List的索引index和切片Slice知识点做进一 ...
Python学习入门基础教程(learning Python)--6.4 Python的list与函数
list是python下的一种数据类型,他和其他类型如整形.浮点型.字符串等数据类型一样也可作为函数的型参和实参来使用! 1.list作为参数 list数据类型可以作为函数的参数传递给函数取做相应的处 ...
Python学习入门基础教程(learning Python)--5.7 Python文件数据记录存储与处理
本节主要讨论Python下如何通过文件操作实现对数据记录集的存储与处理的操作方法. 在Python里和其他高级语言一样可以通过文件读写将一些记录集写入文件或者通过文件读操作从文件里读取一条或多条和数据 ...
Python学习入门基础教程(learning Python)--2.3.3Python函数型参详解
本节讨论Python下函数型参的预设值问题. Python在设计函数时,可以给型参预设缺省值,当用户调用函数时可以不输入实参.如果用户不想使用缺省预设值则需要给型参一一赋值,可以给某些型参赋值或不按型 ...

随机推荐

MySQL数据库开发(1)
数据库的概述 1 什么是数据(Data) 描述事物的符号记录称为数据,描述事物的符号既可以是数字,也可以是文字.图片,图像.声音.语言等,数据由多种表现形式, 它们都可以经过数字化后存入计算机. 在计 ...
Scrapy框架(七)--中间件及Selenium应用
中间件下载中间件(Downloader Middlewares) 位于scrapy引擎和下载器之间的一层组件. 作用:批量拦截到整个工程中所有的请求和响应 - 拦截请求: - UA伪装:proces ...
The bean ‘xxx‘ could not be injected as a ‘xxx‘because it is a JDK dynamic proxy that implements错误解决
1.解决方法:使用@Autowired 2.@autowired和@resource注解的区别区别:1.@Autowired注解由Spring提供,只按照byType注入:@resource注解由J2 ...
Docker入门系列之三：十二个Dockerfile指令
本篇文章是关于Dockerfiles的,这是Docker系列文章的第三部分.如果您还没有读过第一部分,请先阅读它,您可以从全新的角度了解Docker容器概念. 第二部分是Docker生态系统的简要介绍 ...
FreeRTOS简单内核实现5 阻塞延时
0.思考与回答 0.1.思考一为什么 FreeRTOS简单内核实现3 任务管理文章中实现的 RTOS 内核不能看起来并行运行呢? Task1 延时 100ms 之后执行 taskYIELD() 切 ...
app备案
最近app要求备案,使用阿里云备案安卓可以上传apk获取信息,那么ios怎么弄呢 https://zhuanlan.zhihu.com/p/660738854?utm_id=0 查看的时候需要使用m ...
IoTBrowser V2.0：引领物联网时代的全新浏览器
强大的兼容性,无限的可能 IoTBrowser V2.0,基于Chromium内核,完美支持H5/css/js开发界面,让您的物联网应用拥有与主流浏览器同等的流畅体验.同时,它还支持CSS 3动画.C ...
paraview安装
apt 安装 sudo apt install paraview 安装包安装参考 https://blog.csdn.net/weixin_47492286/article/details/1272 ...
Kubernetes 存储资源 PV、PVC 和StorageClass详解
一.存储机制介绍在 Kubernetes 中,存储资源和计算资源(CPU.Memory)同样重要,Kubernetes 为了能让管理员方便管理集群中的存储资源,同时也为了让使用者使用存储更加方便,所 ...
超大容量 | 瑞芯微RK3588J工业核心板新增16GB DDR + 128GB eMMC配置！
作为瑞芯微的金牌合作伙伴,创龙科技在2023年9月即推出搭载瑞芯微旗舰级处理器RK3588J的全国产工业核心板--SOM-TL3588. SOM-TL3588工业核心板是基于瑞芯微RK3588J/RK ...

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation （ 续 #2 ）

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation

Collective Communication

Reduce(…) and Allreduce(…)

Scatterv(…) and Gatherv(…)

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation （ 续 #2 ）的更多相关文章

随机推荐

热门专题

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation （续 #2 ）

分享某Python下的mpi教程 —— A Python Introduction to Parallel Programming with MPI 1.0.2 documentation （续 #2 ）的更多相关文章