Python中syncio和aiohttp

CPython 解释器本身就不是线程安全的，因此有全局解释器锁(GIL)，一次只允许使用一个线程执行 Python 字节码。因此，一个 Python 进程通常不能同时使用多个 CPU 核心。然而，标准库中所有执行阻塞型 I/O 操作的函数，在等待操作系统返回结果时都会释放GIL。这意味着在 Python 语言这个层次上可以使用多线程，而 I/O 密集型 Python 程序能从中受益：一个 Python 线程等待网络响应时，阻塞型 I/O 函数会释放 GIL，再运行一个线程。asyncio这个包使用事件循环驱动的协程实现并发。 asyncio 大量使用 yield from 表达式，因此与Python 旧版不兼容。
asyncio 包使用的“协程”是较严格的定义。适合asyncio API 的协程在定义体中必须使用 yield from，而不能使用 yield。此外，适合 asyncio 的协程要由调用方驱动，并由调用方通过 yield from 调用;

先看2个例子：

import threading

import asyncio

@asyncio.coroutine

def hello():

    print('Start Hello', threading.currentThread())

    yield from asyncio.sleep()

    print('End Hello', threading.currentThread())

@asyncio.coroutine

def world():

    print('Start World', threading.currentThread())

    yield from asyncio.sleep()

    print('End World', threading.currentThread())

# 获取EventLoop:

loop = asyncio.get_event_loop()

tasks = [hello(), world()]

# 执行coroutine

loop.run_until_complete(asyncio.wait(tasks))

loop.close()

@asyncio.coroutine把生成器函数标记为协程类型。
asyncio.sleep(3) 创建一个3秒后完成的协程。
loop.run_until_complete(future)，运行直到future完成;如果参数是 coroutine object,则需要使用 ensure_future()函数包装。
loop.close() 关闭事件循环

import asyncio

@asyncio.coroutine

def worker(text):

    """ 协程运行的函数 :param text::return: """

    i =

    while True:

        print(text, i)

        try:

            yield from asyncio.sleep(.)

        except asyncio.CancelledError:

            break

        i += 

@asyncio.coroutine

def client(text, io_used):

    work_fu = asyncio.ensure_future(worker(text))

    # 假装等待I/O一段时间

    yield from asyncio.sleep(io_used)

    # 结束运行协程

    work_fu.cancel()

    return 'done'

loop = asyncio.get_event_loop()

tasks = [client('xiaozhe', ), client('zzz', )]

result = loop.run_until_complete(asyncio.wait(tasks))

loop.close()

print('Answer:', result)

asyncio.ensure_future(coro_or_future, *, loop=None)：计划安排一个 coroutine object的执行，返回一个 asyncio.Task object。
worker_fu.cancel()：取消一个协程的执行，抛出CancelledError异常。
asyncio.wait()：协程的参数是一个由期物或协程构成的可迭代对象; wait 会分别把各个协程包装进一个 Task 对象。

async和await是针对coroutine的新语法，要使用新的语法，只需要做两步简单的替换。
1. 把@asyncio.coroutine替换为async
2. 把yield from替换为await

@asyncio.coroutine

def hello():

  print("Hello world!")

  r = yield from asyncio.sleep()

  print("Hello again!")

等价于

async def hello():

  print("Hello world!")

  r = await asyncio.sleep()

  print("Hello again!")

asyncio可以实现单线程并发IO操作。如果仅用在客户端，发挥的威力不大。如果把asyncio用在服务器端，例如Web服务器，由于HTTP连接就是IO操作，因此可以用单线程+coroutine实现多用户的高并发支持。

asyncio实现了TCP、UDP、SSL等协议，aiohttp则是基于asyncio实现的HTTP框架

客户端：

import aiohttp

import asyncio

import async_timeout

async def fetch(session, url):

    async with async_timeout.timeout():

        async with session.get(url) as response:

            return await response.text()

async def main():

    async with aiohttp.ClientSession() as session:

        html = await fetch(session, 'http://python.org')

        print(html)

loop = asyncio.get_event_loop()

loop.run_until_complete(main())

服务端：

from aiohttp import web

async def handle(request):

    name = request.match_info.get('name', "Anonymous")

    text = "Hello, " + name

    return web.Response(text=text)

app = web.Application()

app.router.add_get('/', handle)

app.router.add_get('/{name}', handle)

web.run_app(app)

运行结果：

爬取当当畅销书的图书信息的代码如下：

'''异步方式爬取当当畅销书的图书信息'''

import  os

import time

import aiohttp

import asyncio

import pandas as pd

from bs4 import BeautifulSoup

# table表格用于储存书本信息

table = []

# 获取网页（文本信息）

async def fetch(session, url):

    async with session.get(url) as response:

        return await response.text(encoding='gb18030')

# 解析网页

async def parser(html):

    # 利用BeautifulSoup将获取到的文本解析成HTML

    soup = BeautifulSoup(html, 'lxml')

    # 获取网页中的畅销书信息

    book_list = soup.find('ul', class_='bang_list clearfix bang_list_mode')('li')

    for book in book_list:

        info = book.find_all('div')

        # 获取每本畅销书的排名，名称，评论数，作者，出版社

        rank = info[].text[:-]

        name = info[].text

        comments = info[].text.split('条')[]

        author = info[].text

        date_and_publisher= info[].text.split()

        publisher = date_and_publisher[] if len(date_and_publisher)>= else ''

        # 将每本畅销书的上述信息加入到table中

        table.append([rank, name, comments, author, publisher])

# 处理网页

async def download(url):

    async with aiohttp.ClientSession() as session:

        html = await fetch(session, url)

        await parser(html)

# 全部网页

urls = ['http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-recent7-0-0-1-%d'%i for i in range(,)]

# 统计该爬虫的消耗时间

print('#' * )

t1 = time.time() # 开始时间

# 利用asyncio模块进行异步IO处理

loop = asyncio.get_event_loop()

tasks = [asyncio.ensure_future(download(url)) for url in urls]

tasks = asyncio.gather(*tasks)

loop.run_until_complete(tasks)

# 将table转化为pandas中的DataFrame并保存为CSV格式的文件

df = pd.DataFrame(table, columns=['rank', 'name', 'comments', 'author', 'publisher'])

df.to_csv('dangdang.csv', index=False)

t2 = time.time() # 结束时间

print('使用aiohttp，总共耗时：%s' % (t2 - t1))

print('#' * )

Python中syncio和aiohttp的更多相关文章

在Python中使用asyncio进行异步编程
对于来自JavaScript编码者来说,异步编程不是什么新东西,但对于Python开发者来说,async函数和future(类似JS的promise)可不是那么容易能理解的. Concurrency ...
[译]Python中的异步IO:一个完整的演练
原文:Async IO in Python: A Complete Walkthrough 原文作者: Brad Solomon 原文发布时间:2019年1月16日翻译:Tacey Wong 翻译时 ...
Python中异步协程的使用方法介绍
1. 前言在执行一些 IO 密集型任务的时候,程序常常会因为等待 IO 而阻塞.比如在网络爬虫中,如果我们使用 requests 库来进行请求的话,如果网站响应速度过慢,程序一直在等待网站响应,最后 ...
[转]Python中的str与unicode处理方法
早上被python的编码搞得抓耳挠腮,在搜资料的时候感觉这篇博文很不错,所以收藏在此. python2.x中处理中文,是一件头疼的事情.网上写这方面的文章,测次不齐,而且都会有点错误,所以在这里打算自 ...
python中的Ellipsis
...在python中居然是个常量 print(...) # Ellipsis 看别人怎么装逼 https://www.keakon.net/2014/12/05/Python%E8%A3%85%E9 ...
python中的默认参数
https://eastlakeside.gitbooks.io/interpy-zh/content/Mutation/ 看下面的代码 def add_to(num, target=[]): tar ...
Python中的类、对象、继承
类 Python中,类的命名使用帕斯卡命名方式,即首字母大写. Python中定义类的方式如下: class 类名([父类名[,父类名[,...]]]): pass 省略父类名表示该类直接继承自obj ...
python中的TypeError错误解决办法
新手在学习python时候,会遇到很多的坑,下面来具体说说其中一个. 在使用python编写面向对象的程序时,新手可能遇到TypeError: this constructor takes no ar ...
python中的迭代、生成器等等
本人对编程语言实在是一窍不通啊...今天看了廖雪峰老师的关于迭代,迭代器,生成器,递归等等,word天,这都什么跟什么啊... 1.关于迭代如果给定一个list或tuple,我们可以通过for循环来 ...

随机推荐

Codeforces 980D Perfect Groups 计数
原文链接https://www.cnblogs.com/zhouzhendong/p/9074164.html 题目传送门 - Codeforces 980D 题意 $\rm Codeforces$ ...
BZOJ1096 [ZJOI2007]仓库建设动态规划斜率优化
原文链接http://www.cnblogs.com/zhouzhendong/p/8696410.html 题目传送门 - BZOJ1096 题意给定两个序列$a,b,X$,现在划分$a$序列. ...
P1220 关路灯区间dp
题目描述某一村庄在一条路线上安装了n盏路灯,每盏灯的功率有大有小(即同一段时间内消耗的电量有多有少).老张就住在这条路中间某一路灯旁,他有一项工作就是每天早上天亮时一盏一盏地关掉这些路灯. 为了给村 ...
HTML—xhtml和html5
一.什么是XHTML? XHTML指的是可扩展超文本标记语言: XHTML与HTML 4.01几乎是相同的: XHTML是更严格跟纯净的HTML版本: XHTML是以XML应用的方式定义的HTML: ...
oracle中计算百分比，并同时解决小数点前0不显示的问题
select a.catalog_name,decode(substr(trunc((a.s/b.count2),4)*100||'%',0,1),'.',replace(trunc((a.s/b.c ...
themeleaf引入公共页面
<div th:include="/top :: html"></div>//引用公共页面 <div th:replace="head&qu ...
HDU 3594 Cactus (强连通+仙人掌图)
<题目链接> <转载于 >>> > 题目大意: 给你一个图,让你判断他是不是仙人掌图. 仙人掌图的条件是: 1.是强连通图. 2.每条边在仙人掌图中只属于一个 ...
Effective前端1---chapter 2 用CSS画一个三角形
1.CSS画三角形的画法第一步:三角形可以用border画出来,首先一个有四个border的div长这样: <div class="triangle"></di ...
Windows下的Hadoop安装（本地模式）
时隔许久的博客.. 系统为Windows 10,Hadoop版本2.8.3. 虽然之前已经在Linux虚拟机上成功运行了Hadoop,但我还是在Windows上编码更加习惯,所以尝试了在Window上 ...
C# 中BindingSource 的用法
.引言 BindingSource组件是数据源和控件间的一座桥,同时提供了大量的API和Event供我们使用.使用这些API我们可以将Code与各种具体类型数据源进行解耦:使用这些Event我们可以洞 ...

Python中syncio和aiohttp

Python中syncio和aiohttp的更多相关文章

随机推荐

热门专题