requests_html使用asyncio

import asyncio

import functools

from concurrent.futures.thread import ThreadPoolExecutor

from requests_html import HTMLSession

import sys

session = HTMLSession()

async def get_response(executor, *, url, loop: asyncio.AbstractEventLoop = None, ):

    if not loop:

        loop = asyncio.get_running_loop()

    request = functools.partial(session.get, url)

    return loop.run_in_executor(executor, request)

async def bulk_requests(executor, *,

                        urls,

                        loop: asyncio.AbstractEventLoop = None, ):

    for url in urls:

        yield await get_response(executor, url=url, loop=loop)

def filter_unsuccesful_requests(responses_and_exceptions):

    return filter(

        lambda url_and_response: not isinstance(url_and_response[1], Exception),

        responses_and_exceptions.items()

    )

async def main():

    executor = ThreadPoolExecutor(10)

    urls = [

        "https://baidu.com",

        "https://cnblogs.com",

        "https://163.com",

    ]

    requests = [request async for request in bulk_requests(executor, urls=urls, )]

    responses_and_exceptions = dict(zip(urls, await asyncio.gather(*requests, return_exceptions=True)))

    responses = {url: resp.html for (url, resp) in filter_unsuccesful_requests(responses_and_exceptions)}

    for res in responses.items():

        print(res[1].xpath("//head//title//text()")[0])

    for url in urls:

        if url not in responses:

            print(f"No successful request could be made to {url}. Reason: {responses_and_exceptions[url]}",

                  file=sys.stderr)

asyncio.run(main())

requests_html使用asyncio的更多相关文章

Python标准模块--asyncio
1 模块简介 asyncio模块作为一个临时的库,在Python 3.4版本中加入.这意味着,asyncio模块可能做不到向后兼容甚至在后续的Python版本中被删除.根据Python官方文档,asy ...
Asyncio中的Task管理
#!/usr/bin/env python # -*- coding: utf-8 -*- import asyncio import datetime import time from random ...
使用Asyncio的Coroutine来实现一个有限状态机
如图: #!/usr/bin/env python # -*- coding: utf-8 -*- import asyncio import datetime import time from ra ...
在PYTHON3中，使用Asyncio来管理Event loop
#!/usr/bin/env python # -*- coding: utf-8 -*- import asyncio import datetime import time def functio ...
Python asyncio库的学习和使用
因为要找工作,把之前自己搞的爬虫整理一下,没有项目经验真蛋疼,只能做这种水的不行的东西...T T,希望找工作能有好结果. 之前爬虫使用的是requests+多线程/多进程,后来随着前几天的深入了解 ...
python asyncio笔记
1.什么是coroutine coroutine,最早我是在lua里面看到的,coroutine最大的好处是可以保存堆栈,让程序得以继续执行,在python里面,一般是利用yield来实现,具体可以看 ...
Tornado (and Twisted) in the age of asyncio》
Tornado (and Twisted) in the age of asyncio>
【译】深入理解python3.4中Asyncio库与Node.js的异步IO机制
转载自http://xidui.github.io/2015/10/29/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3python3-4-Asyncio%E5%BA%93% ...
PYTHON ASYNCIO: FUTURE, TASK AND THE EVENT LOOP
from :http://masnun.com/2015/11/20/python-asyncio-future-task-and-the-event-loop.html Event Loop On ...

随机推荐

JavaScript Basics_Fundamentals Part 2_A simple calendar
下方的日历框架是从 Active learning: A simple calendar 上整过来的. 主要任务是用 if...else 语句来让日历本显示出每月相对应的天数,相关代码已经给出,我们只 ...
ES6复制数组
ES6复制数组和合并数组一.复制数组与合并数组复制数组:它是复合数据类型,直接复制只是复制了指向底层数据结构的指针,而不是复制一个全新的数组 <!DOCTYPE html> <h ...
angularJs同步请求
今天在写几级联动的时候,因为比如上一个接口请求数据成功返回后,才能根据上一个接口返回的数据请求下一个接口,以此类推:因此有了同步请求的想法. 在前端做同步读取显然不是好的实践做法,同步之后会严重影响前 ...
vue项目在IE浏览器和360兼容模式下页面不显示问题，亲测有效
解决方法:安装 "babel-polyfill" 1.命令:cnpm install --save-dev babel-polyfill 2.在入口main.js文件引入:impo ...
sql临时表通过临时表循环处理数据
-- 创建临时表 IF OBJECT_ID('tempdb.dbo.#temprecord','U') IS NOT NULL DROP TABLE dbo.#temprecord; GO SELEC ...
Redis安装及前后置启动
Redis简单介绍及在Linux上安装(这里测试用是版本:redis-3.0.0.tar.gz) 一:什么是Redis? redis就是C语言编写的一个高性能的键值存储(key-value)的非关系型 ...
高级IO——文件锁
文件锁也被称为记录所,文件锁如果深讲的话,内容不少(比如文件锁最起码分为了建议锁和强制性锁,暂时挖坑,后面填). 文件锁作用顾名思义,就是用来保护文件数据的.当多个进程共享读写同一个文件时,为了不让 ...
2.6. 案例：使用BeautifuSoup4的爬虫
案例:使用BeautifuSoup4的爬虫我们以腾讯社招页面来做演示:http://hr.tencent.com/position.php?&start=10#a 使用BeautifuSou ...
python totp代码
import time import datetime import math import hmac import base64 import qrcode from PIL import Imag ...
django国际化的简单设置
设置国际化的具体步骤: 一.国际化 1)效果:针对不同的国家的人可以配置不同的语言(一般是英文和中文, English Chinese) 2)目的:增加项目的用户量 3)难度:不难比较费劲的就是 ...

requests_html使用asyncio

requests_html使用asyncio的更多相关文章

随机推荐

热门专题