python多进程并发和多线程并发和协程

为什么需要并发编程？

如果程序中包含I/O操作，程序会有很高的延迟，CPU会处于等待状态，这样会浪费系统资源，浪费时间

1.Python的并发编程分为多进程并发和多线程并发

多进程并发：运行多个独立的程序，优势在于并发处理的任务都有操作系统管理，不足的是程序和各个进程间通信和数据共享不方便
多线程并发：有程序员管理并发处理人物，这种并发的可以方便的在线程间共享数据，前提是不能被锁住

对于计算密集型程序：多进程并发优于多线程并发，计算密集型指的是：程序运行的时间大部分都消耗在cpu的运算处理过程中，而对内存磁盘的消耗时间较短对于i/o密集型程序：多线程并发优于多进程并发，i/o密集型与计算密集型正好相反

2.python支持的多进程并发有两种方式

1.通过进程安全的数据结构：multiprocess.JoinableQueue：这种数据结构程序员自己管理'加锁'过程，不用担心死锁问题
2.通过concurr.futures抽象出来的ProcessPollExecutor

multiprocess的JoinableQueue

multiprocess(是进程间安全的) 是Python标准库中的支持进程的模块，JoinableQueue队列本质上是FIFO队列，
与一般的队列(queue中的Queue)区别在于：JoinableQueue是能支持多进程并发和保证进程间的数据通信，
是进程间安全的，这意味着我们不用担心它的互斥和死锁问题，JoinableQueue主要可以用来存放执行任务和收集任务的执行结果
JoinableQueue队列为什么是进程间安全的？
    因为不管什么队列，都是自带了锁，所中的上下文管理中(with)包含了acquire()和release()方法。
    队列基于文件家族的socket实现的，队列中是有管道(pipe)和锁实现的。

from multiprocessing import JoinableQueue,Process,current_process
def consumer(jq,name):
    while True:
        word = jq.get()
        print('%s 取到了%s'%(name,word))
        jq.task_done()
        # print(jq.task_done())

def print_word(jq,produce_name):
    for c in [chr(ord('A')+i) for i in range(26)]:
        jq.put(c)
        print("%s 生产了一个 %s"%(produce_name,c))
    jq.join()

if __name__ == '__main__':
    jq = JoinableQueue()   #不管用什么队列，都必须要先实例化
    produce_name = 'admin'
    # consumer_list = ['kobe','t-mac']
    pn = Process(target=print_word,args=(jq,produce_name))
    tt1 = Process(target=consumer, args=(jq,'kobe'))
    tt2 = Process(target=consumer, args=(jq,'t-mac'))
    tt1.daemon =True
    tt2.daemon =True
    tt1.start()
    tt2.start()
    pn.start()
    pn.join()

'''
使用JoinableQueue实现多线程搞得时候要注意这几个方法的关联
1.pn.join() 为什么子线程pn要设置阻塞？
    首先要知道join方法是Process提供来管理子进程的，在同步阻塞的时候，只关心子线程是否执行完毕，只要子线程结束，才会执行阻塞后的代码
2.jq.join()为什么子进程设置阻塞？
    因为jq要等待consumer方法将全部的字母取走，只要consumer将全部的字母取走之后，jq才不阻塞了，代表着print_word方法已经将工作做完了
    jq.join()在这里可以理解成：当队列为空我才不阻塞了，
3.jq.task_done()是什么意思？
    通知producer里面的jq.join()，要将队列里面的计数器-1，因为有一个数据被取走了，当所有的任务处理完之后，队列的计数器为0
    也就是队列为空了，这是jq.join()就不阻塞了
4.那tt1.daemon =True和tt2.daemon =True子进程为什么要设置成守护进程？
    我们子啊consumer里面写的是while True，理解是这个死循环永远不能退出
    但是守护进程会随着主进程的运行完毕之后跟着结束，我们退出不了while，那把主进程kill掉，也就退出了死循环
    所以：
        pn.join()在等待这自己的子进程函数print_word执行完毕，因为执行到主线程搞得代码结束为止，所以主线程退出
        但是print_word()子线程里面的jq.join()也在阻塞这，等待jq.task_done()发过来的最后一个信号，也就是队列为空
        所以，当jq.task_done()发空信号之后jq.join()不阻塞，然后pn.join()不阻塞，程序结束
        而进程中的守护进程的特点是随着主进程的结束而结束，所以整个程序结束，name死循环也就结束了。

理解task_done()：
如果进程或线程每从队列里取一次，但没有执行task_done()，则join无法判断队列到底有没有结束，在最后执行个join()是等不到结果的，会一直挂起。
可以理解为，每task_done一次 就从队列里删掉一个元素，这样在最后join的时候根据队列长度是否为零来判断队列是否结束，从而执行主线程。
'''

multiprocessing中的JoinableQueue实现多进程并发

concurrent.futures的ProcessPoolExecutor

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
ThreadPoolExecutor 和ProcessPoolExecutor分别对threading和multiprocessing进行了高级抽象，暴露出简单的统一接口。

future 是一种对象，表示异步执行的操作。这个概念是 concurrent.futures模块和asyncio包的基础。
从Python3.4起，标准库中有两个为Future的类：concurrent.futures.Future 和 asyncio.Future。
这两个类作用相同：两个Future类的实例都表示可能已经完成或未完成的延迟计算。
Future 封装待完成的操作，可放入队列，完成的状态可以查询，得到结果（或抛出异常）后可以获取结果（或异常）。

import os,time
from urllib.request import urlopen
from concurrent.futures import ProcessPoolExecutor

def get_html(name,addr):
    ret = urlopen(addr)
    return {'name': name, 'content': ret.read()}

def print_info(connect):
    dic = connect.result()
    with open(dic['name']+'.html',mode='wb') as f:
        f.write(dic['content'])

if __name__ == '__main__':
    start = time.time()
    url_list = {
        '百度':'https://www.baidu.com',
        'p0st':'https://www.cnblogs.com/p0st/p/10453405.html',
        'jd':'https://www.jd.com',
        '博客园':'https://www.cnblogs.com',
        'a':'https://www.baidu.com',
        'b': 'https://www.cnblogs.com/p0st/p/10453405.html',
        'c': 'https://www.jd.com',
        'd': 'https://www.cnblogs.com'
    }
    p = ProcessPoolExecutor(4)
    for item in url_list:
        task = p.submit(get_html,item,url_list[item]).add_done_callback(print_info)
        task.add_done_callback(print_info)
        #使用submit向进程池中添加任务，返回的时一个future对象，使用result从中取值
        #在此过程中是并发的去执行的
        #p.submit(get_html,item,url_list[item]).add_done_callback(print_info)
        #add_done_callback 回调函数，将task
    p.shutdown()
    # 阻塞主进程 直到池中的任务都完成为止，然后进程还会回到进程池里面等待任务
    print(time.time()-start)

    #with ProcessPoolExecutor(4)  as p:
    #    for item in url_list:
    #        p.submit(get_html, item, url_list[item]).add_done_callback(print_info)
    #    p.shutdown()
    #print(time.time() - start)
'''
#ret = p.map(make, range(100))  是for循环的简便写法，一般只用于传递一个参数
#ret=返回的是返回的是生成器，再循环就可以了

#如果task.result()执行修改完了之后才会返回结果，但是如果没执行完，就会等待，所以是一个阻塞方法
#如果有可能发生阻塞的方法，那个这个方法就是阻塞方法

'''
concurrent.futures中的ProcessPoolExecutor实现多进程并发

concurrent.futures中的ProcessPoolExecutor实现多进程并发

3.python中的多线程并发也有两种方式

对于IO密集型程序，多线程并发可能要优于多进程并发。因为对于网络通信等IO密集型任务来说，决定程序效率的主要是网络延迟，这时候是使用进程还是线程就没有太大关系了。

程序与多进程基本一致，只是这里我们不必使用multiProcessing.JoinableQueue对象了，一般的队列（来自queue.Queue)就可以满足要求：

from queue import Queue

def read(q):
    while True:
        try:
            value = q.get()
            print('Get %s from queue.' % value)
            time.sleep(random.random())
        finally:
            q.task_done()

def main():
    q = queue.Queue()
    pw1 = threading.Thread(target=read, args=(q,))
    pw2 = threading.Thread(target=read, args=(q,))
    pw1.daemon = True
    pw2.daemon = True
    pw1.start()
    pw2.start()
    for c in [chr(ord('A')+i) for i in range(26)]:
        q.put(c)
    try:
        q.join()
    except KeyboardInterrupt:
        print("stopped by hand")

if __name__ == '__main__':
    main()

queuq中Queue实现多线程并发

concurrent.futures的ThreadPoolExecutor

concurrent.futures的ThreadPoolExecutor抽象接口不仅可以帮我们自动调度线程，还能：　　1.主线程可以获取某一个线程（或者任务的）的状态，以及返回值。　　2.当一个线程完成的时候，主线程能够立即知道。　　3.让多线程和多进程的编码接口一致。

1、建立线程池：executor = ThreadPoolExecutor(max_workers= )
2、提交执行函数到线程池：task = executor.submit(func,(args))
3、获取执行结果：task.result()
4、判断线程是否完成：task.done()
5、取消还没执行的线程：task.cancel()
6、利用as_completed获取线程完成结果，返回迭代器
7、通过executor的map获取已经完成的task值
for data in executor.map(get_html,urls):
    print(data)
8、使用wait()方法阻塞线程

ThreadPoolExecutor方法

from urllib.request import  urlopen
from concurrent.futures import ThreadPoolExecutor
def get_html(name,addr):
    ret = urlopen(addr)
    return {'name': name, 'content': ret.read()}

def print_info(connect):
    dic = connect.result()
    with open(dic['name']+'.html',mode='wb') as f:
        f.write(dic['content'])
url_list = {
        '百度':'https://www.baidu.com',
        'p0st':'https://www.cnblogs.com/p0st/p/10453405.html',
        'jd':'https://www.jd.com',
        '博客园':'https://www.cnblogs.com',
        'a':'https://www.baidu.com',
        'b': 'https://www.cnblogs.com/p0st/p/10453405.html',
        'c': 'https://www.jd.com',
        'd': 'https://www.cnblogs.com'
    }
t = ThreadPoolExecutor(4)
if __name__ == '__main__':
    for item in  url_list:
        task = t.submit(get_html,item,url_list[item])    #回调函数，当执行完task后，立即将返回值传给回调函数
        task.add_done_callback(print_info)
        # 回调函数，当执行完task后，立即将返回值传给回调函数

concurrent.futures中的ThreadPoolExecutor实现多线程并发

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    print(list(executor.map(sleeper, x)))

ThreadPoolExecutor高级写法

from concurrent.futures import ProcessPoolExecutor
def pool_factorizer_go(nums, nprocs):
   nprocs=xxx
    with ProcessPoolExecutor(max_workers=nprocs) as executor:
        return {num:factors for num, factors in
                                zip(nums,
                                    executor.map(factorize_naive, nums))}

ProcessPoolExecutor高级写法

#在一个函数中等待另一个结果的完毕。
import time
def wait_on_b():
    time.sleep(5)
    print(b.result())  #b不会完成，他一直在等待a的return结果
    return 5

def wait_on_a():
    time.sleep(5)
    print(a.result())  #同理a也不会完成，他也是在等待b的结果
    return 6

executor = ThreadPoolExecutor(max_workers=2)
a = executor.submit(wait_on_b)
b = executor.submit(wait_on_a)

多线程或多进程中死锁的列子

4.进程/线程中创建协程

from multiprocessing import Process

from gevent import monkey;monkey.patch_all()
import gevent
import time
import random
class Sayhi(Process):
    def __init__(self,name):
        super().__init__()
        self.name = name

    @staticmethod
    def func(item):
        time.sleep(random.uniform(1,2))
        print('协程：sayhi %s'%item)

    def run(self):
        count = 20
        ll= []
        for item in range(count):
            gg = gevent.spawn(self.func,item)
            ll.append(gg)
        gevent.joinall(ll)

if __name__ == '__main__':
    p = Sayhi('kobe')
    p.start()
    p.join()

进程中创建协程

import os
from gevent import monkey;monkey.patch_all()
import time
import random
import gevent
from threading import Thread
class Sayhi(Thread):
    def __init__(self,name):
        super().__init__()
        self.name = name

    @staticmethod
    def func(item):
        time.sleep(random.uniform(1,2))
        print('协程：sayhi %s  %s'%(item,os.getpid()))

    def run(self):
        count = 20
        l = []
        for i in range(count):
            g = gevent.spawn(self.func,i)
            l.append(g)
        gevent.joinall(l)
if __name__ == '__main__':
    p = Sayhi('kobe')
    p.start()
    p.join()
    print(os.getpid())

线程里面开启协程

线程中创建协程

asyncino yiled from 拉拉

ProcessPoolExecutor资料


返回系列

python多进程并发和多线程并发和协程的更多相关文章

python 多进程并发与多线程并发
本文对python支持的几种并发方式进行简单的总结. Python支持的并发分为多线程并发与多进程并发(异步IO本文不涉及).概念上来说,多进程并发即运行多个独立的程序,优势在于并发处理的任务都由操作 ...
Python开发【第九篇】：协程、异步IO
协程协程,又称微线程,纤程.英文名Coroutine.一句话说明什么是协程,协程是一种用户态的轻量级线程. 协程拥有自己的寄存器上下文和栈.协程调度切换时,将寄存器上下文和栈保存到其他地方,在切换回 ...
(转)Python黑魔法 --- 异步IO（ asyncio）协程
转自:http://www.jianshu.com/p/b5e347b3a17c?from=timeline Python黑魔法 --- 异步IO( asyncio) 协程作者人世间关注 201 ...
Python、进程间通信、进程池、协程
进程间通信进程彼此之间互相隔离,要实现进程间通信(IPC),multiprocessing模块支持两种形式:队列和管道,这两种方式都是使用消息传递的. 进程队列queue 不同于线程queue,进程 ...
都2019年了，Java为什么还在坚持多线程不选择协程？
都2019年了,Java为什么还在坚持多线程不选择协程? - 知乎 https://www.zhihu.com/question/332042250/answer/734051666
python教程：使用 async 和 await 协程进行并发编程
python 一直在进行并发编程的优化, 比较熟知的是使用 thread 模块多线程和 multiprocessing 多进程,后来慢慢引入基于 yield 关键字的协程. 而近几个版本,python ...
Python 用队列实现多线程并发
# Python queue队列,实现并发,在网站多线程推荐最后也一个例子,比这货简单,但是不够规范 # encoding: utf-8 __author__ = 'yeayee.com' # 由本站 ...
python 并发编程基于gevent模块协程池实现并发的套接字通信
基于协程池实现并发的套接字通信客户端: from socket import * client = socket(AF_INET, SOCK_STREAM) client.connect(('12 ...
portscaner 多线程、多协程并发端口扫描
import socket,time,re,sys,os,threading import gevent from gevent import monkey monkey.patch_all() so ...

随机推荐

IT行业中文资源网址集绵
1. IT网址:https://github.com/ityouknow/awesome-list 2.后端架构师网址:https://github.com/xingshaocheng/archite ...
mysql 的crud操作（增删改查）
1.mysql添加记录 --添加记录的语法(可添加单条记录或者多条记录),INTO是可以省略的,字段名也可以省略的,但是如果省略的话,后面对应的value的值就要全部填写 INSERT [INTO] ...
excel导出导入通用方法
/** * 方法说明:批量导出通用方法 * 创建时间:2018年8月24日 *** * @param filePath 文件地址 * @param sheetName 分页名称 * @param ti ...
14.并发与异步 - 2.任务Task -《果壳中的c#》
线程是创建并发的底层工具,因此具有一定的局限性. 没有简单的方法可以从联合(Join)线程得到"返回值".因此必须创建一些共享域.当抛出一个异常时,捕捉和处理异常也是麻烦的. 线程 ...
tensorflow添加层-【老鱼学tensorflow】
本节主要定义个添加层的函数,在深度学习中是通过创建多层神经网络来实现的,因此添加层的函数会被经常用到: import tensorflow as tf def add_layer(inputs, in ...
在react-native项目中使用iconfont自定义图标库(android)
1. 安装react-native-vector-icons yarn add react-native-vector-icons react-native link 如果没有关联成功的话,可以参考官 ...
Maven安装配置（Windows10）
想要安装 Apache Maven 在Windows 系统上, 需要下载 Maven 的 zip 文件,并将其解压到你想安装的目录,并配置 Windows 环境变量. 所需工具 : JDK 1.8 M ...
微信小程序--家庭记账本开发--03
组件.标签以及模板的使用在一个微信小程序中,需要用到大量的组件,一些页面的设计也需要模板,在自己所学课程中,对于一些组件.标签模板的使用有了初步的了解. 1.组件组件是数据和方法的简单封装,对于微 ...
SQLServer 2014 内存优化表
内存优化表是 SQLServer 2014 的新功能,它是可以将表放在内存中,这会明显提升DML性能.关于内存优化表,更多可参考两位大侠的文章:SQL Server 2014新特性探秘(1)-内存数据 ...
Hadoop Yarn 框架原理及运作机制及与MapReduce比较
Hadoop 和 MRv1 简单介绍 Hadoop 集群可从单一节点(其中所有 Hadoop 实体都在同一个节点上运行)扩展到数千个节点(其中的功能分散在各个节点之间,以增加并行处理活动).图 1 演 ...

python多进程并发和多线程并发和协程

python多进程并发和多线程并发和协程的更多相关文章

随机推荐

热门专题