python concurrent.futures

python因为其全局解释器锁GIL而无法通过线程实现真正的平行计算。这个论断我们不展开，但是有个概念我们要说明，IO密集型 vs. 计算密集型。

IO密集型：读取文件，读取网络套接字频繁。

计算密集型：大量消耗CPU的数学与逻辑运算，也就是我们这里说的平行计算。

而concurrent.futures模块，可以利用multiprocessing实现真正的平行计算。

核心原理是：concurrent.futures会以子进程的形式，平行的运行多个python解释器，从而令python程序可以利用多核CPU来提升执行速度。由于子进程与主解释器相分离，所以他们的全局解释器锁也是相互独立的。每个子进程都能够完整的使用一个CPU内核。

第一章 concurrent.futures性能阐述

最大公约数

这个函数是一个计算密集型的函数。

# -*- coding:utf-8 -*-

# 求最大公约数

def gcd(pair):

    a, b = pair

    low = min(a, b)

    for i in range(low, 0, -1):

        if a % i == 0 and b % i == 0:

            return i

numbers = [

    (1963309, 2265973), (1879675, 2493670), (2030677, 3814172),

    (1551645, 2229620), (1988912, 4736670), (2198964, 7876293)

]

不使用多线程/多进程

import time

start = time.time()

results = list(map(gcd, numbers))

end = time.time()

print 'Took %.3f seconds.' % (end - start)

Took 2.507 seconds.

消耗时间是：2.507。

多线程ThreadPoolExecutor

import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, Executor

start = time.time()

pool = ThreadPoolExecutor(max_workers=2)

results = list(pool.map(gcd, numbers))

end = time.time()

print 'Took %.3f seconds.' % (end - start)

Took 2.840 seconds.

消耗时间是：2.840。

上面说过gcd是一个计算密集型函数，因为GIL的原因，多线程是无法提升效率的。同时，线程启动的时候，有一定的开销，与线程池进行通信，也会有开销，所以这个程序使用了多线程反而更慢了。

多进程ProcessPoolExecutor

import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, Executor

start = time.time()

pool = ProcessPoolExecutor(max_workers=2)

results = list(pool.map(gcd, numbers))

end = time.time()

print 'Took %.3f seconds.' % (end - start)

Took 1.861 seconds.

消耗时间：1.861。

在两个CPU核心的机器上运行多进程程序，比其他两个版本都快。这是因为，ProcessPoolExecutor类会利用multiprocessing模块所提供的底层机制，完成下列操作：

1）把numbers列表中的每一项输入数据都传给map。

2）用pickle模块对数据进行序列化，将其变成二进制形式。

3）通过本地套接字，将序列化之后的数据从煮解释器所在的进程，发送到子解释器所在的进程。

4）在子进程中，用pickle对二进制数据进行反序列化，将其还原成python对象。

5）引入包含gcd函数的python模块。

6）各个子进程并行的对各自的输入数据进行计算。

7）对运行的结果进行序列化操作，将其转变成字节。

8）将这些字节通过socket复制到主进程之中。

9）主进程对这些字节执行反序列化操作，将其还原成python对象。

10）最后，把每个子进程所求出的计算结果合并到一份列表之中，并返回给调用者。

multiprocessing开销比较大，原因就在于：主进程和子进程之间通信，必须进行序列化和反序列化的操作。

第二章 concurrent.futures源码分析

Executor

可以任务Executor是一个抽象类，提供了如下抽象方法submit，map(上面已经使用过)，shutdown。值得一提的是Executor实现了__enter__和__exit__使得其对象可以使用with操作符。关于上下文管理和with操作符详细请参看这篇博客http://www.cnblogs.com/kangoroo/p/7627167.html

ThreadPoolExecutor和ProcessPoolExecutor继承了Executor，分别被用来创建线程池和进程池的代码。

class Executor(object):

    """This is an abstract base class for concrete asynchronous executors."""

    def submit(self, fn, *args, **kwargs):

        """Submits a callable to be executed with the given arguments.

        Schedules the callable to be executed as fn(*args, **kwargs) and returns

        a Future instance representing the execution of the callable.

        Returns:

            A Future representing the given call.

        """

        raise NotImplementedError()

    def map(self, fn, *iterables, **kwargs):

        """Returns a iterator equivalent to map(fn, iter).

        Args:

            fn: A callable that will take as many arguments as there are

                passed iterables.

            timeout: The maximum number of seconds to wait. If None, then there

                is no limit on the wait time.

        Returns:

            An iterator equivalent to: map(func, *iterables) but the calls may

            be evaluated out-of-order.

        Raises:

            TimeoutError: If the entire result iterator could not be generated

                before the given timeout.

            Exception: If fn(*args) raises for any values.

        """

        timeout = kwargs.get('timeout')

        if timeout is not None:

            end_time = timeout + time.time()

        fs = [self.submit(fn, *args) for args in itertools.izip(*iterables)]

        # Yield must be hidden in closure so that the futures are submitted

        # before the first iterator value is required.

        def result_iterator():

            try:

                for future in fs:

                    if timeout is None:

                        yield future.result()

                    else:

                        yield future.result(end_time - time.time())

            finally:

                for future in fs:

                    future.cancel()

        return result_iterator()

    def shutdown(self, wait=True):

        """Clean-up the resources associated with the Executor.

        It is safe to call this method several times. Otherwise, no other

        methods can be called after this one.

        Args:

            wait: If True then shutdown will not return until all running

                futures have finished executing and the resources used by the

                executor have been reclaimed.

        """

        pass

    def __enter__(self):

        return self

    def __exit__(self, exc_type, exc_val, exc_tb):

        self.shutdown(wait=True)

        return False

下面我们以线程ProcessPoolExecutor的方式说明其中的各个方法。

map

map(self, fn, *iterables, **kwargs)

map方法的实例我们上面已经实现过，值得注意的是，返回的results列表是有序的，顺序和*iterables迭代器的顺序一致。

这里我们使用with操作符，使得当任务执行完成之后，自动执行shutdown函数，而无需编写相关释放代码。

import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, Executor

start = time.time()

with ProcessPoolExecutor(max_workers=2) as pool:

    results = list(pool.map(gcd, numbers))

print 'results: %s' % results

end = time.time()

print 'Took %.3f seconds.' % (end - start)

产出结果是：

results: [1, 5, 1, 5, 2, 3]

Took 1.617 seconds.

submit

submit(self, fn, *args, **kwargs)

submit方法用于提交一个可并行的方法，submit方法同时返回一个future实例。

future对象标识这个线程/进程异步进行，并在未来的某个时间执行完成。future实例表示线程/进程状态的回调。

import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, Executor

start = time.time()

futures = list()

with ProcessPoolExecutor(max_workers=2) as pool:

    for pair in numbers:

        future = pool.submit(gcd, pair)

        futures.append(future)

print 'results: %s' % [future.result() for future in futures]

end = time.time()

print 'Took %.3f seconds.' % (end - start)

产出结果是：

results: [1, 5, 1, 5, 2, 3]

Took 2.289 seconds.

future

submit函数返回future对象，future提供了跟踪任务执行状态的方法。比如判断任务是否执行中future.running()，判断任务是否执行完成future.done()等等。

as_completed方法传入futures迭代器和timeout两个参数

默认timeout=None，阻塞等待任务执行完成，并返回执行完成的future对象迭代器，迭代器是通过yield实现的。

timeout>0，等待timeout时间，如果timeout时间到仍有任务未能完成，不再执行并抛出异常TimeoutError

import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, Executor, as_completed

start = time.time()

with ProcessPoolExecutor(max_workers=2) as pool:

    futures = [ pool.submit(gcd, pair) for pair in numbers]

    for future in futures:

        print '执行中:%s, 已完成:%s' % (future.running(), future.done())

    print '#### 分界线 ####'

    for future in as_completed(futures, timeout=2):

        print '执行中:%s, 已完成:%s' % (future.running(), future.done())

end = time.time()

print 'Took %.3f seconds.' % (end - start)

wait

wait方法接会返回一个tuple(元组)，tuple中包含两个set(集合)，一个是completed(已完成的)另外一个是uncompleted(未完成的)。

使用wait方法的一个优势就是获得更大的自由度，它接收三个参数FIRST_COMPLETED, FIRST_EXCEPTION和ALL_COMPLETE，默认设置为ALL_COMPLETED。

import time

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, Executor, as_completed, wait, ALL_COMPLETED, FIRST_COMPLETED, FIRST_EXCEPTION

start = time.time()

with ProcessPoolExecutor(max_workers=2) as pool:

    futures = [ pool.submit(gcd, pair) for pair in numbers]

    for future in futures:

        print '执行中:%s, 已完成:%s' % (future.running(), future.done())

    print '#### 分界线 ####'

    done, unfinished = wait(futures, timeout=2, return_when=ALL_COMPLETED)

    for d in done:

        print '执行中:%s, 已完成:%s' % (d.running(), d.done())

        print d.result()

end = time.time()

print 'Took %.3f seconds.' % (end - start)

由于设置了ALL_COMPLETED，所以wait等待所有的task执行完成，可以看到6个任务都执行完成了。

执行中:True, 已完成:False

执行中:True, 已完成:False

执行中:True, 已完成:False

执行中:True, 已完成:False

执行中:False, 已完成:False

执行中:False, 已完成:False

#### 分界线 ####

执行中:False, 已完成:True

执行中:False, 已完成:True

执行中:False, 已完成:True

执行中:False, 已完成:True

执行中:False, 已完成:True

执行中:False, 已完成:True

Took 1.518 seconds.

如果我们将配置改为FIRST_COMPLETED，wait会等待直到第一个任务执行完成，返回当时所有执行成功的任务。这里并没有做并发控制。

重跑，结构如下，可以看到执行了2个任务。

执行中:True, 已完成:False

执行中:True, 已完成:False

执行中:True, 已完成:False

执行中:True, 已完成:False

执行中:False, 已完成:False

执行中:False, 已完成:False

#### 分界线 ####

执行中:False, 已完成:True

执行中:False, 已完成:True

Took 1.517 seconds.

python concurrent.futures的更多相关文章

python concurrent.futures包使用，捕获异常
concurrent.futures的ThreadPoolExecutor类暴露的api很好用,threading模块抹油提供官方的线程池.和另外一个第三方threadpool包相比,这个可以非阻塞的 ...
python concurrent.futures.Threadpoolexcutor的有界队列和无界队列
1.默认是无界队列,如果生产任务的速度大大超过消费的速度,则会把生产任务无限添加到无界队列中,这样一来控制不了生产速度,二来是会造成系统内存会被队列中的元素堆积增多而耗尽. 2.改写为有界队列 cla ...
Python标准模块--concurrent.futures
1 模块简介 concurrent.futures模块是在Python3.2中添加的.根据Python的官方文档,concurrent.futures模块提供给开发者一个执行异步调用的高级接口.con ...
在python中使用concurrent.futures实现进程池和线程池
#!/usr/bin/env python # -*- coding: utf-8 -*- import concurrent.futures import time number_list = [1 ...
python简单粗暴多进程之concurrent.futures
python在前面写过多线程的库threading: python3多线程趣味详解但是今天发现一个封装得更加简单暴力的多进程库concurrent.futures: # !/usr/bin/pyth ...
python异步并发模块concurrent.futures入门详解
concurrent.futures是一个非常简单易用的库,主要用来实现多线程和多进程的异步并发. 本文主要对concurrent.futures库相关模块进行详解,并分别提供了详细的示例demo. ...
Thread类的其他方法,同步锁,死锁与递归锁,信号量,事件,条件,定时器,队列,Python标准模块--concurrent.futures
参考博客: https://www.cnblogs.com/xiao987334176/p/9046028.html 线程简述什么是线程?线程是cpu调度的最小单位进程是资源分配的最小单位进程和线 ...
python 全栈开发，Day42(Thread类的其他方法,同步锁,死锁与递归锁,信号量,事件,条件,定时器,队列,Python标准模块--concurrent.futures)
昨日内容回顾线程什么是线程?线程是cpu调度的最小单位进程是资源分配的最小单位进程和线程是什么关系? 线程是在进程中的一个执行单位多进程本质上开启的这个进程里就有一个线程多线程单纯的在当 ...
Python之线程 3 - 信号量、事件、线程队列与concurrent.futures模块
一信号量二事件三条件Condition 四定时器(了解) 五线程队列六标准模块-concurrent.futures 基本方法 ThreadPoolExecutor的简单使用 Pro ...

随机推荐

SUSE Linux Enterprise 11 离线安装 DLIB python机器学习模块
python机器学习模块安装环境:SUSE Linux Enterprise 11 sp4 离线安装说明:在安装dlib时依赖的基础环境较多,先升级gcc,以适应c++ 11的使用:需要用到c ...
★RFC标准库_目录链接
RFC(Request For Comments)是一个国际标准化的数据库,记录了从计算机到互联网的海量标准协议.它是一个免费公开的IT标准文件分享平台,其内容也在不断增长,与时俱进.它与ISO等组织 ...
JSON与String之间互转
一,String转json 这个JSON.parse()与eval()都可以实现,但是它们是有区别的, JSON.parse对json字符串要求比eval()更为严格,key名称(例如name)全部必 ...
sublime编辑器代码背景刺眼怎么修改？
有些人觉得如上图大括号刺眼,怎么把它改得不那么刺眼呢? [第一步]打开Bracket Hightlighter插件的用户配置文件: 然后按ctrl+G跳转到第330行, 如图位置改为"sty ...
设置为互斥按钮的一组Radio按钮的用法
设置为互斥按钮的一组Radio,只需要将第一个Radio的Group属性设置为True,并为之映射变量(DDX),其余radio的Group属性设置为False,不需要映射变量. 否则会出现不是互斥按 ...
SNS团队第三次站立会议（2017.04.24）
一.当天站立式会议照片本次会议主要内容:汇报工作进度,根据完成情况调整进度二.每个人的工作成员今天已完成的工作明天计划完成的工作罗于婕相关数据库文件建立起来完善数据库文件龚晓婷研 ...
微信小程序scroll标签的测试
一:testscroll.wxml的代码如下.testview.js自动生成示例代码 //testscroll.wxml <view class="section__title&quo ...
201521123004 《Java程序设计》第1周学习总结
1. 本章学习总结 (1)安装各种软件(jdk,eclipse,git(安装不了)) 注册账号(博客,网易邮箱(QQ邮箱不能用)码云) 创建项目(码云,Java) (2)了解JAVA语言的发展史(su ...
Java课程设计——猜数游戏（201521123111 陈伟泽）
Java课程设计--猜数游戏(201521123111 陈伟泽) 1.团队课程设计博客链接博客作业--猜数游戏 2.个人负责模块或任务说明 Answer:一些基础界面的构造,排行榜的构造,用文件录入 ...
201521123026《Java程序设》第10周学习总结
1. 本章学习总结 1.1 以你喜欢的方式(思维导图或其他)归纳总结异常与多线程相关内容. 1.守护线程:setDaemon(true or false),如果所有前台线程死亡,守护线程自动结束,一般 ...

python concurrent.futures

python concurrent.futures的更多相关文章

随机推荐

热门专题