深入理解python协程

概述
- yield
- send
asyncio.coroutine和yield from

概述

由于 cpu和磁盘读写的效率有很大的差距，往往cpu执行代码，然后遇到需要从磁盘中读写文件的操作，此时主线程会停止运行，等待IO操作完成后再继续进行，这要就导致cpu的利用率非常的低。

协程可以实现单线程同时执行多个任务，但是需要自己手动的通过send函数和yield关键字配合来传递消息，asyncio模块能够自动帮我们传递消息。

python中协程主要经历了如下三个阶段

1）生成器变形 yield/send

2）asyncio.coroutine和yield from

3）async/await关键字

## 生成器变形 yield/send

yield

Python中函数如果把return换成了yield，那么这个函数就不再普通函数了，而是一个生成器

简单生成器示例：

def mygen(alist):	# define a generator

    while alist:

        c = alist.pop()

        yield c

lst = [1, 2, 3]

g = mygen(lst)	# get a generator object

print(g)		# <generator object mygen at 0x0000020225555F10>

while True:

    try:

        print(next(g))	# 3 2 1

    except StopIteration:

        break

生成器本质上也是迭代器，因此不仅可以使用next()取值，还可以使用for循环取值

for item in g:

    print(item)		# 3 2 1

send

生成器函数最大的特点是可以接收一个外部传入的变量，并根据变量内容计算结果后返回，这个特点是根据send()函数实现的

send()函数使用示例：

def gen():

    value = 0

    while True:

        receive = yield value

        if receive == "Q" or receive == "q":

            break

        value = "got:%s" % receive

g = gen()

print(g.send(None))     # 第一个必须是None，否则会报错

print(g.send("hello~"))

print(g.send(123))

print(g.send([1, 2, 3]))

执行结果

0

got:hello~

got:123

got:[1, 2, 3]

注意：第一个send()里传入的变量必须是None，否则会报错TypeError: can't send non-None value to a just-started generator

这里最关键的一步就是receive = yield value，这一句实际上分为三步

1）向函数外抛出（返回）value

2）暂停，等待next()或send()恢复

3）将等号右边的表达式的值(这个值是传入的)赋值给receive

下面来梳理一下执行流程

1）通过g.send(None)或者next(g)启动生成器函数，并执行到第一个yield的位置

2）执行yield value，程序返回value，也就是0，之后暂停，等待下一个next()或send()，注意这时并没有给receive赋值

3）gen返回value之后跳出，执行主程序里面的g.send("hello_{")，执行这一句会传入"hello}"，从之前暂停的位置继续执行，也就是赋值给receive，继续往下执行，value变成"got:hello~"，然后判断while，执行到yield value，返回value，所以打印出"got:hello~"，之后进入暂停，等待下一个send()激活

4）后续的g.send(123)执行流程类似，如果传入"q"，gen会执行到break，整个函数执行完毕，会得StopIteration

从上面可以看出，在第一次send(None)启动生成器（执行1>2，通常第一次返回的值并没有什么用）之后，对于外部的每一次send()，生成器的实际在循环中的运行顺序是3–>1–>2，也就是先获取值，然后do something，然后返回一个值，再暂停等待。

### yield from

yield from是Python3.3引入的，先来看一段代码

def gen1():

    yield range(5)

def gen2():

    yield from range(5)

iter1 = gen1()

iter2 = gen2()

for item in iter1:

    print(item)

for item in iter2:

    print(item)

执行结果

range(0, 5)

0

1

2

3

4

从上面的示例可以看出来yield是将range这个可迭代对象直接返回，而yield from解析range对象，将其中每一个item返回，yield from本质上等于

for item in iterable:

    yield item

注意yield from后面只能接**可迭代对象**

下面来看一个例子，我们编写一个斐波那契数列函数

def fab(max):

    n, a, b = 0, 0, 1

    while n < max:

        yield b

        a, b = b, a+b

        n += 1

f = fab(5)

fab不是一个普通函数，而是一个生成器。因此fab(5)并没有执行函数，而是返回一个生成器对象，假设要在fab()的基础上实现一个函数，调用起始都要记录日志

def wrapper(func_iter):

    print("start")

    for item in func_iter:

        yield item

    print("end")

wrap = wrapper(fab(5))

for i in wrap:

    print(i)

下面使用yield from代替for循环

def wrapper(func_iter):

    print("start")

    yield from func_iter

    print("end")

wrap = wrapper(fab(5))

for i in wrap:

    print(i)

asyncio.coroutine和yield from

yield from在asyncio模块（python3.4引入）中得以发扬光大。之前都是手动的通过send函数和yield关键字配合来传递消息，现在当声明函数为协程后，我们通过事件循环来调度协程。

import asyncio, random

@asyncio.coroutine		# 将一个generator定义为coroutine

def smart_fib(n):

    i, a, b = 0, 0, 1

    while i < n:

        sleep_time = random.uniform(0, 0.2)

        yield from asyncio.sleep(sleep_time)    # 通常yield from后都是接的耗时操作

        print("smart take %s secs to get %s" % (sleep_time, b))

        a, b = b, a+b

        i += 1

@asyncio.coroutine

def stupid_fib(n):

    i, a, b = 0, 0, 1

    while i < n:

        sleep_time = random.uniform(0, 0.5)

        yield from asyncio.sleep(sleep_time)

        print("stupid take %s secs to get %s" % (sleep_time, b))

        a, b = b, a+b

        i += 1

if __name__ == '__main__':

    loop = asyncio.get_event_loop()     # 获取事件循环的引用

    tasks = [       					# 创建任务列表

        smart_fib(10),

        stupid_fib(10),

    ]

    loop.run_until_complete(asyncio.wait(tasks))	# wait会分别把各个协程包装进一个Task 对象。

    print("All fib finished")

    loop.close()

yield from语法可以让我们方便地调用另一个generator。本例中yield from后面接的asyncio.sleep()也是一个coroutine(里面也用了yield from)，所以线程不会等待asyncio.sleep()，而是直接中断并执行下一个消息循环。当asyncio.sleep()返回时，线程就可以从yield from拿到返回值（此处是None），然后接着执行下一行语句。

asyncio是一个基于事件循环的实现异步I/O的模块。通过yield from，我们可以将协程asyncio.sleep的控制权交给事件循环，然后挂起当前协程；之后，由事件循环决定何时唤醒asyncio.sleep,接着向后执行代码。

协程之间的调度都是由事件循环决定。

yield from asyncio.sleep(sleep_secs) 这里不能用time.sleep(1)因为time.sleep()返回的是None，它不是iterable，还记得前面说的yield from后面必须跟iterable对象(可以是生成器，迭代器)。

另一个示例

import asyncio

@asyncio.coroutine

def wget(host):

    print('wget %s...' % host)

    connect = asyncio.open_connection(host, 80)  # 与要获取数据的网页建立连接

    # 连接中包含一个 reader和writer

    reader, writer = yield from connect  # 通过writer向服务器发送请求，通过reader读取服务器repnse回来的请求

    header = 'GET / HTTP/1.0\r\nHost: %s\r\n\r\n' % host  # 组装请求头信息

    writer.write(header.encode('utf-8'))  # 需要对请求头信息进行编码

    yield from writer.drain()  # 由于writer中有缓冲区，如果缓冲区没满不且drain的话数据不会发送出去

    while True:

        line = yield from reader.readline()  # 返回的数据放在了reader中，通过readline一行一行地读取数据

        if line == b'\r\n':  # 因为readline实际上已经把\r\n转换成换行了，而此时又出现\r\n说明以前有连续两组\r\n

            break  # 即\r\n\r\n,所以下面就是response body了

        print('%s header > %s' % (host, line.decode('utf-8').rstrip()))

    # Ignore the body, close the socket

    writer.close()

    # reader.close()   AttributeError: 'StreamReader' object has no attribute 'close'

if __name__ == '__main__':

    loop = asyncio.get_event_loop()

    tasks = [wget(host) for host in ['www.sina.com.cn', 'www.sohu.com', 'www.163.com']]

    loop.run_until_complete(asyncio.wait(tasks))

    loop.close()

## async和await

弄清楚了asyncio.coroutine和yield from之后，在Python3.5中引入的async和await就不难理解了，我们使用的时候只需要把@asyncio.coroutine换成async，把yield from换成await就可以了。当然，从Python设计的角度来说，async/await让协程表面上独立于生成器而存在，将细节都隐藏于asyncio模块之下，语法更清晰明了。

加入新的关键字 async ，可以将任何一个普通函数变成协程

一个简单的示例

import time, asyncio, random

async def mygen(alist):

    while alist:

        c = alist.pop()

        print(c)

lst = [1, 2, 3]

g = mygen(lst)

print(g)

执行结果

<coroutine object mygen at 0x00000267723FB3B8>		# 协程对象

sys:1: RuntimeWarning: coroutine 'mygen' was never awaited

可以看到，我们在前面加上async，该函数就变成了一个协程，但是**async对生成器是无效的**

async def mygen(alist):

    while alist:

        c = alist.pop()

        yield c

lst = [1, 2, 3]

g = mygen(lst)

print(g)

执行结果

<async_generator object mygen at 0x000001540EF505F8>	# 并不是协程对象

所以正常的协程是这样的

import time, asyncio, random

async def mygen(alist):

    while alist:

        c = alist.pop()

        print(c)

        await asyncio.sleep(1)

lst1 = [1, 2, 3]

lst2 = ["a", "b", "c"]

g1 = mygen(lst1)

g2 = mygen(lst2)

要运行协程，要用事件循环
在上面的代码下面加上：

if __name__ == '__main__':

    loop = asyncio.get_event_loop()

    tasks = [

        c1,

        c2

    ]

    loop.run_until_complete(asyncio.wait(tasks))

    print("all finished")

    loop.close()

参考：

1）https://blog.csdn.net/soonfly/article/details/78361819

2）https://blog.csdn.net/weixin_40247263/article/details/82728437