发送请求

使用Requests发送网络请求很简单

#首先要导入requests库

import requests

#返回一个Response对象

r=requests.get('https://github.com/timeline.json')

#由此可以看出来Requests的API十分简单，

#post请求

r=requests.post('http://httpbin.org/post')

#Put请求，delete请求，head请求，options请求

r=requests.put('http://httpbin.org/put')

传递URL参数

一般的get传递参数方法是将数据与url地址用？连起来。

Requests库允许使用params关键字参数，以一个dict来提供这些参数。

import requests

payload={'key1':'value1','key2':'value2'}

r=requests.get('http://httpbin.org/get',params=payload)

print(r.url)#可以看出response对象由url属性

http://httpbin.org/get?key1=value1&key2=value2

响应内容

通过Response的text属性，我们可以读取服务器响应的内容。并且Requests会自动解码来自服务器的内容。

import requests

r=requests.get('https://github.com/timeline.json')

r.text#已经经过自动解码

'{"message":"Hello there, wayfaring stranger. If you’re reading this then you probably didn’t see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead.","documentation_url":"https://developer.github.com/v3/activity/events/#list-public-events"}'

r.encoding

#通过encoding属性显示响应的编码，并且可以改变编码属性。可以使用自定义的编码

'utf-8'

二进制响应内容

以字节的方式访问请求响应体

r.content#未解码的内容，

b'{"message":"Hello there, wayfaring stranger. If you\xe2\x80\x99re reading this then you probably didn\xe2\x80\x99t see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead.","documentation_url":"https://developer.github.com/v3/activity/events/#list-public-events"}'

#Requests会自动解码gzip和deflate传输编码的响应数据。

#举例，以请求返回的二进制数据创建一张图片

from PIL import Image

from io import BytesIO

i=Image.open(BytesIO(r.content))

JSON响应内容

import requests

r=requests.get('https://github.com/timeline.json')

r.json()

{'documentation_url': 'https://developer.github.com/v3/activity/events/#list-public-events',

 'message': 'Hello there, wayfaring stranger. If you’re reading this then you probably didn’t see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead.'}

定制请求头headers

所有的header必须是string、bytestring或者unicode

url='https://api.github.com/some/endpoint'

headers={'user-agent':'my-app/0.0.1'}

r=requests.get(url,headers=headers)

复杂的POST请求

payload={'key1':'value1','key2':'value2'}

#这里还可以讲dict替换成元祖列表，尤其在表单中多个元素使用同一key的时候。

#还可以将data=替换成json=，传入json对象。

r=requests.post('http://httpbin.org/post',data=payload)

print(r.text)

{

  "args": {},

  "data": "",

  "files": {},

  "form": {

    "key1": "value1",

    "key2": "value2"

  },

  "headers": {

    "Accept": "*/*",

    "Accept-Encoding": "gzip, deflate",

    "Connection": "close",

    "Content-Length": "23",

    "Content-Type": "application/x-www-form-urlencoded",

    "Host": "httpbin.org",

    "User-Agent": "python-requests/2.18.4"

  },

  "json": null,

  "origin": "36.102.236.202",

  "url": "http://httpbin.org/post"

}

POST一个多部分编码的文件

url='http://httpbin.org/post'

files={'file':open('report.xls','rb')}#强烈建议使用二进制模式打开文件。

r=requests.post(url,files=files)

r.text#因为我们没有report.xls文件，所以不展示输出结果了

响应状态码

r.status_code#获取状态码

r.status_code==requests.codes.ok#内置由状态码查询对象

#如果发送了一个错误请求，可以通过Response.raise_for_status()抛出异常，如果不是错误码，抛出None

响应头

r.headers

{'Date': 'Thu, 16 Nov 2017 13:01:18 GMT', 'Content-Type': 'application/json', 'Access-Control-Allow-Credentials': 'true', 'X-Processed-Time': '0.00141000747681', 'Via': '1.1 vegur', 'Content-Length': '465', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Server': 'meinheld/0.6.1', 'X-Powered-By': 'Flask'}

Cookie

#访问cookie

r.cookies['cookie_name']

#发送cookie到服务器

cookies=dict(cookies_are='workding')

r.requests.get(url,cookies=cookies)

r.text#会打印出cookie来

#Cookie的返回对象为RequestsCookieJar，类似于字典，适合跨域名路径使用。

jar=requests.cookies.RequestsCookieJar()

jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')

jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')

url = 'http://httpbin.org/cookies'

r = requests.get(url, cookies=jar)

r.text

重定向与请求历史

除了HEAD，Requests会自动处理所有重定向。

可以使用history方法来追踪。

超时

设置timeout参数，以秒为单位。

timeout仅对连接过程有效，与响应体的下载无关。

会话对象

Session对象能够实现跨请求保持参数。并且在同一个session实例发出的所有请求之间保持cookie。

Session对象具有主要的RequestsAPI的所有方法。

s=requests.Session()

s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')

r=s.get('http://httpbin.org/cookies')

print(r.text)

{

  "cookies": {

    "sessioncookie": "123456789"

  }

}

#会话也可用来为请求方法提供缺省数据

s=request.Session()

s.auth=('user','pass')

s.headers.update({'x-test':'true'})

#x-test和x-test2都会发送出去

s.get('http://httpbin.org/headers',headers={'x-test2':'true'})

#任何传递给请求方法的字典都会与已设置会话层数据合并。但是方法级别的参数不会被跨请求保持，比如：

s=requests.Session()

r=s.get('http://httpbin.org/cookies',cookies={'from-my':'browser'})

print(r.text)

{

  "cookies": {

    "from-my": "browser"

  }

}

r=s.get('http://httpbin.org/cookies')

print(r.text)

{

  "cookies": {}

}

请求与响应对象

其实requests.get()调用，第一是构建一个Request对象，向某个服务器发送请求，第二是从服务器返回的响应Response对象，该对象包含服务器返回的所有信息。

r.headers#访问服务器返回来的响应头部信息

r.request.headers#获取发送到服务器的请求的头部信息

准备的请求

如果在发送请求之前，需要对body或者header进行额外处理，可以这么做：

from requests import Request,Session

s=Session()

req=Request('GET',url,data=data,headers=header)

prepped=req.prepare()

#对prepped的body和header进行修改

resp=s.send(prepped,stream=stream,verify=verify,proxies=proxies,cert=cert,timeout=timeout)

print(resp.status_code)

但是上述代码会失去Requests Session对象的一些优势，尤其是Session级别的状态，要获取一个带有状态的PreparedRequest，需要用Session.prepare_request()取代Request.prepare()的调用。

SSL证书验证

SSL验证默认开始，如果验证失败，会抛出SSLError

requests.get('http://requestb.in')

<Response [200]>

requests.get('https://github.com',verify=True)

#如果设置为False，会忽略对SSL证书的验证。

<Response [200]>

verify可以是包含可信任CA证书的文件夹路径：比如verify='/path/to/certfile'。

或者将verify保持在会话中：

s = requests.Session()

s.verify = '/path/to/certfile'

代理-proxies参数

import requests

proxies = {

  "http": "http://10.10.1.10:3128",

  "https": "http://10.10.1.10:1080",

}

requests.get("http://example.org", proxies=proxies)

如果您觉得感兴趣的话，可以添加我的微信公众号：一步一步学Python

爬虫入门【2】Requests库简介的更多相关文章

网络爬虫入门：你的第一个爬虫项目（requests库）
0.采用requests库虽然urllib库应用也很广泛,而且作为Python自带的库无需安装,但是大部分的现在python爬虫都应用requests库来处理复杂的http请求.requests库语 ...
爬虫入门之urllib库详解(二)
爬虫入门之urllib库详解(二) 1 urllib模块 urllib模块是一个运用于URL的包 urllib.request用于访问和读取URLS urllib.error包括了所有urllib.r ...
Python爬虫入门——使用requests爬取python岗位招聘数据
爬虫目的使用requests库和BeautifulSoup4库来爬取拉勾网Python相关岗位数据爬虫工具使用Requests库发送http请求,然后用BeautifulSoup库解析HTML文 ...
从0开始学爬虫9之requests库的学习之环境搭建
从0开始学爬虫9之requests库的学习之环境搭建 Requests库的环境搭建环境:python2.7.9版本参考文档:http://2.python-requests.org/zh_CN/l ...
芝麻HTTP： Python爬虫利器之Requests库的用法
前言之前我们用了 urllib 库,这个作为入门的工具还是不错的,对了解一些爬虫的基本理念,掌握爬虫爬取的流程有所帮助.入门之后,我们就需要学习一些更加高级的内容和工具来方便我们的爬取.那么这一节来 ...
Python爬虫入门之Urllib库的高级用法
1.设置Headers 有些网站不会同意程序直接用上面的方式进行访问,如果识别有问题,那么站点根本不会响应,所以为了完全模拟浏览器的工作,我们需要设置一些Headers 的属性. 首先,打开我们的浏览 ...
Python爬虫入门之Urllib库的基本使用
那么接下来,小伙伴们就一起和我真正迈向我们的爬虫之路吧. 1.分分钟扒一个网页下来怎样扒网页呢?其实就是根据URL来获取它的网页信息,虽然我们在浏览器中看到的是一幅幅优美的画面,但是其实是由浏览器解 ...
python爬虫之一：requests库
目录安装requtests requests库的连接异常 HTTP协议 HTTP协议对资源的操作 requests库的7个主要方法 request方法 get方法网络爬虫引发的问题 robots协 ...
PYTHON 爬虫笔记三:Requests库的基本使用
知识点一:Requests的详解及其基本使用方法什么是requests库 Requests库是用Python编写的,基于urllib,采用Apache2 Licensed开源协议的HTTP库,相比u ...

随机推荐

asp.net限制用户登录错误次数
很经常在登录一个网站的时候看到,如果你登录的时候输入的账号密码错误超过三次就被锁定,然后等一段时间才能继续登录,最最经常使用的就是银行系统啦~~ 该功能处理流程如下: string uid = Req ...
Linux学习之二-Linux系统的目录结构
Linux学习之二-Linux系统的目录结构在Linux的根目录下,有很多的目录,但是需要记住,对于Linux而言,一切皆文件.因此此处的目录也是文件.用ls / 命令就能看到根目录下的各类不同的目 ...
[Angular] ngPlural
The usecase is very simple: <div [ngPlural]="items.length"> <ng-template ngPlural ...
9.11排序与查找（一）——给定两个排序后的数组A和B，当中A的末端有足够的缓冲空间容纳B。将B合并入A并排序
/** * 功能:给定两个排序后的数组A和B,当中A的末端有足够的缓冲空间容纳B.将B合并入A并排序. */ /** * 问题:假设将元素插入数组A的前端,就必须将原有的元素向后移动,以腾出空间. ...
JAVA反射机制--静态加载与动态加载
Java反射是Java被视为动态(或准动态)语言的一个关键性质.这个机制允许程序在运行时透过Reflection APIs取得任何一个已知名称的class的内部信息,包括其modifiers(诸如pu ...
List装form
List<MemberPrivilegeForm> formlist = new ArrayList<MemberPrivilegeForm>(); int status = ...
linux 设置时间
1.图形界面里面点击右上角的 “时间” 2.设置,edit 找到china G8区 date发现还没有生效 3.reboot 重启才生效了
Linux 各个子系统以及监控、测试、优化这些子系统所用到的工具
监控测试: 3.优化
关于汉诺塔，C++代码，代码效果演算
1．故事介绍汉诺塔:汉诺塔(又称河内塔)问题是源于印度一个古老传说的益智玩具.大梵天创造世界的时候做了三根金刚石柱子,在一根柱子上从下往上依照大小顺序摞着64片黄金圆盘.大梵天命令婆罗门把圆盘 ...
【Lucene】Apache Lucene全文检索引擎架构之搜索功能3
上一节主要总结了一下Lucene是如何构建索引的,这一节简单总结一下Lucene中的搜索功能.主要分为几个部分,对特定项的搜索:查询表达式QueryParser的使用:指定数字范围内搜索:指定字符串开 ...

爬虫入门【2】Requests库简介