Python3 的urllib实例

在Python3中合并了 urllib 和 urllib2，统一命名为 urllib 了，我觉得这样更加合理了。让我们可以像读取本地文件一样读取WEB上的数据。封装了一个类，供以后方便使用吧！并附带有许多的应用实例。

一、封装的类

#!/usr/bin/env python3

# -*- coding: utf-8 -*-  

import time

import sys

import gzip

import socket

import urllib.request, urllib.parse, urllib.error

import http.cookiejar  

class HttpTester:

    def __init__(self, timeout=10, addHeaders=True):

        socket.setdefaulttimeout(timeout)   # 设置超时时间  

        self.__opener = urllib.request.build_opener()

        urllib.request.install_opener(self.__opener)  

        if addHeaders: self.__addHeaders()  

    def __error(self, e):

        '''''错误处理'''

        print(e)  

    def __addHeaders(self):

        '''''添加默认的 headers.'''

        self.__opener.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'),

                                    ('Connection', 'keep-alive'),

                                    ('Cache-Control', 'no-cache'),

                                    ('Accept-Language:', 'zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3'),

                                    ('Accept-Encoding', 'gzip, deflate'),

                                    ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')]  

    def __decode(self, webPage, charset):

        '''''gzip解压，并根据指定的编码解码网页'''

        if webPage.startswith(b'\x1f\x8b'):

            return gzip.decompress(webPage).decode(charset)

        else:

            return webPage.decode(charset)  

    def addCookiejar(self):

        '''''为 self.__opener 添加 cookiejar handler。'''

        cj = http.cookiejar.CookieJar()

        self.__opener.add_handler(urllib.request.HTTPCookieProcessor(cj))  

    def addProxy(self, host, type='http'):

        '''''设置代理'''

        proxy = urllib.request.ProxyHandler({type: host})

        self.__opener.add_handler(proxy)  

    def addAuth(self, url, user, pwd):

        '''''添加认证'''

        pwdMsg = urllib.request.HTTPPasswordMgrWithDefaultRealm()

        pwdMsg.add_password(None, url, user, pwd)

        auth = urllib.request.HTTPBasicAuthHandler(pwdMsg)

        self.__opener.add_handler(auth)  

    def get(self, url, params={}, headers={}, charset='UTF-8'):

        '''''HTTP GET 方法'''

        if params: url += '?' + urllib.parse.urlencode(params)

        request = urllib.request.Request(url)

        for k,v in headers.items(): request.add_header(k, v)    # 为特定的 request 添加指定的 headers  

        try:

            response = urllib.request.urlopen(request)

        except urllib.error.HTTPError as e:

            self.__error(e)

        else:

            return self.__decode(response.read(), charset)  

    def post(self, url, params={}, headers={}, charset='UTF-8'):

        '''''HTTP POST 方法'''

        params = urllib.parse.urlencode(params)

        request = urllib.request.Request(url, data=params.encode(charset))  # 带 data 参数的 request 被认为是 POST 方法。

        for k,v in headers.items(): request.add_header(k, v)  

        try:

            response = urllib.request.urlopen(request)

        except urllib.error.HTTPError as e:

            self.__error(e)

        else:

            return self.__decode(response.read(), charset)  

    def download(self, url, savefile):

        '''''下载文件或网页'''

        header_gzip = None  

        for header in self.__opener.addheaders:     # 移除支持 gzip 压缩的 header

            if 'Accept-Encoding' in header:

                header_gzip = header

                self.__opener.addheaders.remove(header)  

        __perLen = 0

        def reporthook(a, b, c):    # a:已经下载的数据大小; b:数据大小; c:远程文件大小;

            if c > 1000000:

                nonlocal __perLen

                per = (100.0 * a * b) / c

                if per>100: per=100

                per = '{:.2f}%'.format(per)

                print('\b'*__perLen, per, end='')     # 打印下载进度百分比

                sys.stdout.flush()

                __perLen = len(per)+1  

        print('--> {}\t'.format(url), end='')

        try:

            urllib.request.urlretrieve(url, savefile, reporthook)   # reporthook 为回调钩子函数，用于显示下载进度

        except urllib.error.HTTPError as e:

            self.__error(e)

        finally:

            self.__opener.addheaders.append(header_gzip)

            print()

二、应用实例

在OSC上动弹一下

Python3 的urllib实例的更多相关文章

python3: 爬虫---- urllib, beautifulsoup
最近晚上学习爬虫,首先从基本的开始: python3 将urllib,urllib2集成到urllib中了, urllib可以对指定的网页进行请求下载, beautifulsoup 可以从杂乱的ht ...
python3中urllib库的request模块详解
刚刚接触爬虫,基础的东西得时时回顾才行,这么全面的帖子无论如何也得厚着脸皮转过来啊! 原帖地址:https://www.2cto.com/kf/201801/714859.html 什么是 Urlli ...
Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)
urllib是python的一个获取url(Uniform Resource Locators,统一资源定址器)了,我们可以利用它来抓取远程的数据进行保存哦,下面整理了一些关于urllib使用中的一些 ...
Python3中urllib详细使用方法(header,代理,超时,认证,异常处理) 转
urllib是python的一个获取url(Uniform Resource Locators,统一资源定址器)了,我们可以利用它来抓取远程的数据进行保存哦,下面整理了一些关于urllib使用中的一些 ...
Python2和Python3中urllib库中urlencode的使用注意事项
前言在Python中,我们通常使用urllib中的urlencode方法将字典编码,用于提交数据给url等操作,但是在Python2和Python3中urllib模块中所提供的urlencode的包 ...
常见的爬虫分析库（1）-Python3中Urllib库基本使用
原文来自:https://www.cnblogs.com/0bug/p/8893677.html 什么是Urllib? Python内置的HTTP请求库 urllib.request ...
Python3中Urllib库基本使用
什么是Urllib? Python内置的HTTP请求库 urllib.request 请求模块 urllib.error 异常处理模块 urllib.par ...
Python -- 网络编程 -- 认识Python3的urllib库
Python3的urllib包含5个模块 urllib error parse request response robotparser 各个模块的主要成员: error ['ContentTooSh ...
Python3 使用 urllib 编写爬虫
什么是爬虫爬虫,也叫蜘蛛(Spider),如果把互联网比喻成一个蜘蛛网,Spider就是一只在网上爬来爬去的蜘蛛.网络爬虫就是根据网页的地址来寻找网页的,也就是URL.举一个简单的例子,我们在浏览器 ...

随机推荐

js实现继承的方式
[原文] 前言 JS作为面向对象的弱类型语言,继承也是其非常强大的特性之一.那么如何在JS中实现继承呢?让我们拭目以待. JS继承的实现方式既然要实现继承,那么首先我们得有一个父类,代码如下: // ...
Java中的日期和时间
Java中的日期和时间 Java在java.util包中提供了Date类,这个类封装了当前的日期和时间. Date类支持两种构造函数.第一个构造函数初始化对象的当前日期和时间. Date() 下面的构 ...
ubuntu 支持中文
1.cat /usr/share/i18n/SUPPORTED 说明:查看系统支持的字符集,你需要注意的是支持字符集的格式,如对中文会有以下一些显示(我的系统如此,我不知是否普遍) zh_CN.GB1 ...
Spring -- 入门，装备集合，自动装配，分散装配，自定义编辑器
1. 概要 struts2:web hibernate:持久化 spring:业务层.管理bean的,容器.List Map Set. 体验spring: 1.创建java项目. 2.引入spring ...
JavaWeb -- Struts2 ResultType细化, 国际化
1. ResultType细化 <result-types> <result-type name="chain" class="com.opensymp ...
JavaScript tips —— 搞定闰年
前言处理时间时,常常要考虑用户的输入是否合法,其中一个很典型的场景就是平闰年的判断,网上其实有很多类似的算法,但是其实不必那么麻烦,下面我讲讲的我的思路. 规则公元年数可被4整除为闰年,但是整百( ...
svn官方版本的使用
创建仓库的命令是:svndadmin create c:\abcde 启动命令是:svnserve -d -r c:\abcde 官方版本,svn路径
SQLite在C#中的使用
SQLite是一款轻型的数据库,在一些数据量不太大的程序中,它暂用的资源非常低.支持很多操作系统和许多语言,所以还是很方便的.在C#中,要用的话可以通过网站来下载或者在VS中通过NuGet来下载.这个 ...
lightoj1213推公式
很容易推出来的公式ans=n^(k-1)*k*sum 然后快速幂就好了 #include<map> #include<set> #include<cmath> #i ...
【Python】小技巧
1. 退出python shell 在windows下,Ctrl + Z退出在unix下,Ctrl + D退出

Python3 的urllib实例

一、封装的类

二、应用实例

Python3 的urllib实例的更多相关文章

随机推荐

热门专题