python urllib urllib2

区别

urllib2可以接受一个Request类的实例来设置URL请求的headers，urllib仅可以接受URL。这意味着，用urllib时不可以伪装User Agent字符串等。
urllib提供urlencode方法用来encode发送的data，而urllib2没有。这是为何urllib常和urllib2一起使用的原因。

urllib

1 urllib.urlopen(url[,data[,proxies]])

打开一个url的方法，返回一个文件对象

>>> req = urllib.urlopen('http://www.baidu.com')

>>> req.readline() # 读取一行

'<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="\xe7\x99\xbe\xe5\xba\xa6\xe6\x90\x9c\xe7\xb4\xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93</title>\n'

urlopen返回对象提供方法：

- read() , readline() ,readlines() , fileno() , close()：这些方法的使用方式与文件对象完全一样

- info()：返回一个httplib.HTTPMessage对象，表示远程服务器返回的头信息

- getcode()：返回Http状态码。如果是http请求，200请求成功完成;404网址未找到

- geturl()：返回请求的url

2 urllib.urlretrieve(url[,filename[,reporthook[,data]]])

urlretrieve方法将url定位到的html文件下载到你本地的硬盘中。如果不指定filename，则会存为临时文件。

urlretrieve()返回一个二元组(filename,mine_hdrs)

>>> filename = urllib.urlretrieve('http://www.baidu.com')

>>> type(filename)

<type 'tuple'>

>>> filename

('/tmp/tmphngDjh', <httplib.HTTPMessage instance at 0x7fd5e03ea248>)

>>> filename = urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu')

>>> type(filename)

<type 'tuple'>

>>> filename

('/tmp/baidu', <httplib.HTTPMessage instance at 0x7fd5e03dbb48>)

3 urllib.urlcleanup()

清除由于urllib.urlretrieve()所产生的缓存

4 urllib.quote(url)和urllib.quote_plus(url)

将url数据获取之后，并将其编码，从而适用与URL字符串中，使其能被打印和被web服务器接受。

>>> urllib.quote('http://www.baidu.com')

'http%3A//www.baidu.com'

>>> urllib.quote_plus('http://www.baidu.com')

'http%3A%2F%2Fwww.baidu.com'

5 urllib.unquote(url)和urllib.unquote_plus(url)

与4的函数相反。

6 urllib.urlencode(query)

将URL中的键值对以连接符&划分

GET方法

>>> import urllib

>>> params=urllib.urlencode({'spam':1,'eggs':2,'bacon':0})

>>> params

'eggs=2&bacon=0&spam=1'

>>> f=urllib.urlopen("http://python.org/query?%s" % params)

>>> print f.read()

POST方法

>>> import urllib

>>> parmas = urllib.urlencode({'spam':1,'eggs':2,'bacon':0})

>>> f=urllib.urlopen("http://python.org/query", parmas)

>>> f.read()

urllib2

http://www.codefrom.com/paper/深入理解urllib、urllib2及requests

http://zhuoqiang.me/python-urllib2-usage.html

1 urllib2.urlopen()

>>> import urllib2

>>> url = 'http://www.baidu.com'

>>> req = urllib2.urlopen(url)

>>> req.readline()

'<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="\xe7\x99\xbe\xe5\xba\xa6\xe6\x90\x9c\xe7\xb4\xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93</title>\n'

2 urllib2.Request()

>>> url = 'http://www.baidu.com'

>>> req = urllib2.Request(url)

>>> resp = urllib2.urlopen(req) #使用对象

>>> resp.readline()

'<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="\xe7\x99\xbe\xe5\xba\xa6\xe6\x90\x9c\xe7\xb4\xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93</title>\n'

3 urllib2.Request(url[, data][, headers][, originreqhost][, unverifiable])

import urllib, urllib2

url = 'http://www.someserver.com/cgi-bin/register.cgi'

values = {'name' : 'Michael Foord', 'location' : 'Northampton', 'language' : 'Python' }

data = urllib.urlencode(values)

req = urllib2.Request(url, data)   #send post

resp = urllib2.urlopen(req)

resp.read()

5 header

import urllib, urllib2

url = 'http://www.someserver.com/cgi-bin/register.cgi'

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

values = {'name' : 'Michael Foord', 'location' : 'Northampton', 'language' : 'Python' }

headers = { 'User-Agent' : user_agent }

data = urllib.urlencode(values)

req = urllib2.Request(url, data, headers)

resp = urllib2.urlopen(req)

resp.read()

6 add_header(key, val)

import urllib2

req = urllib2.Request('http://www.example.com/')

req.add_header('Referer', 'http://www.python.org/')

resq = urllib2.urlopen(req)

7 PUT和DELETE方法

import urllib2

request = urllib2.Request(uri, data=data)

request.get_method = lambda: 'PUT' # or 'DELETE'

response = urllib2.urlopen(request)

注意

1. 如果只是单纯的下载或者显示下载进度，不对下载后的内容做处理等，比如下载图片，css，js文件等，可以用urlilb.urlretrieve()

2. 如果是下载的请求需要填写表单，输入账号，密码等，建议用urllib2.urlopen(urllib2.Request())

3. 在对字典数据编码时候，用到的是urllib.urlencode()

python urllib urllib2的更多相关文章

Python +urllib+urllib2 带数据的post请求实例
#coding:utf-8 ''' Created on 2017年11月2日 @author: li.liu ''' import urllib import urllib2 import re f ...
python中urllib, urllib2,urllib3, httplib,httplib2, request的区别
permike原文python中urllib, urllib2,urllib3, httplib,httplib2, request的区别若只使用python3.X, 下面可以不看了, 记住有个ur ...
Python:urllib和urllib2的区别(转)
原文链接:http://www.cnblogs.com/yuxc/ 作为一个Python菜鸟,之前一直懵懂于urllib和urllib2,以为2是1的升级版.今天看到老外写的一篇<Python: ...
Python urllib和urllib2模块学习(一）
(参考资料:现代魔法学院 http://www.nowamagic.net/academy/detail/1302803) Python标准库中有许多实用的工具类,但是在具体使用时,标准库文档上对使用 ...
python urllib和urllib2 区别
python有一个基础的库叫httplib.httplib实现了HTTP和HTTPS的客户端协议,一般不直接使用,在python更高层的封装模块中(urllib,urllib2)使用了它的http实现 ...
python通过get方式,post方式发送http请求和接收http响应-urllib urllib2
python通过get方式,post方式发送http请求和接收http响应-- import urllib模块,urllib2模块, httplib模块 http://blog.163.com/xyc ...
python中 urllib, urllib2, httplib, httplib2 几个库的区别
转载摘要: 只用 python3, 只用 urllib 若只使用python3.X, 下面可以不看了, 记住有个urllib的库就行了 python2.X 有这些库名可用: urllib, urll ...
人生苦短之Python的urllib urllib2 requests
在Python中涉及到URL请求相关的操作涉及到模块有urllib,urllib2,requests,其中urllib和urllib2是Python自带的HTTP访问标准库,requsets是第三方库 ...
HTTP Header Injection in Python urllib
catalogue . Overview . The urllib Bug . Attack Scenarios . 其他场景 . 防护/缓解手段 1. Overview Python's built ...

随机推荐

基于List集合映射
1. 实体类中使用List集合 public class Grade { private int id; private String name; private List<Student> ...
nginx软负载的搭建
Nginx ("engine x") 是一个高性能的 HTTP 和反向代理服务器,也是一个 IMAP/POP3/SMTP 代理服务器,在高连接并发的情况下Nginx 是 Apa ...
DBA_Oracle DBA常用SQL汇总（概念）
2014-06-20 Created By BaoXinjian
jstack使用－倒出线程堆栈
jstack用于打印出给定的java进程ID或core file或远程调试服务的Java堆栈信息,如果是在64位机器上,需要指定选项"-J-d64",Windows的jstack使 ...
sql中实现split()功能
http://www.cnblogs.com/yangyy753/archive/2011/11/23/2260618.html 数据库中,总是遇到一些字段内容,想根据某个标识截取一下字符串,可是都想 ...
【JavaScript】创建命名空间，Class，LOG
JxUnderscore(function (J, _, root) { var isWindow, deepObject, Namespace, Class, LOG; /** * 一个对象是否为w ...
PHP框架 Laravel Eloquent ORM 批量插入数据 && 批量更新目前没有
foreach ($products as $v=>$a) { $count[] = array('product_name' => $a['name'], 'product_weight ...
ipython and bpython
ipython: 1.安装easy_install工具 wget http://peak.telecommunity.com/dist/ez_setup.py python ez_setup.py 2 ...
C++学习10 static静态成员变量和静态成员函数
一般情况下,如果有N个同类的对象,那么每一个对象都分别有自己的成员变量,不同对象的成员变量各自有值,互不相干.但是有时我们希望有某一个或几个成员变量为所有对象共有,这样可以实现数据共享. 可以使用全局 ...
回朔法/KMP算法-查找字符串
回朔法:在字符串查找的时候最容易想到的是暴力查找,也就是回朔法.其思路是将要寻找的串的每个字符取出,然后按顺序在源串中查找,如果找到则返回true,否则源串索引向后移动一位,再重复查找,直到找到返回t ...