学习目的：

　　request库比urllib库使用更加简洁，且更方便。

正式步骤

Step1：什么是requests

　　requests是用Python语言编写，基于urllib，采用Apache2 Licensed开源协议的HTTP库。它比urllib更加方便，可以节约大量工作时间，还完全满足HTTP测试需求，是一个简单易用的HTTP库。

Step2：实例引入

# -*-  coding:utf-8 -*-

import requests

response = requests.get('http://www.baidu.com')

print(type(response))

print(response.content)

print(response.status_code)

print(response.text)

print(type(response.text))

print(response.cookies)

重要：

response.content()：这是从网络上直接抓取的数据，没有经过任何的解码，是一个bytes类型，
response.text()：这是str类型数据，是requests库将response.content进行解码的字符串，解码需要指定一个编码方式，requests会根据自己的猜测来判断编码方式，有时候会判断错误，所以最稳妥的办法是response.content.decode("utf-8")，指定一个编码方式手动解码

Step3：各种请求方式

# -*-  coding:utf-8 -*-

import requests

requests.post('http://httpbin.org/post')

requests.put('http://httpbin.org/put')

requests.delete('http://httpbin.org/delete')

requests.head('http://httpbin.org/get')

requests.options('http://httpbin.org/get')

get请求
① 基本用法

# -*-  coding:utf-8 -*-

import requests

response = requests.get('http://httpbin.org/get')

print(response.text)

运行结果：

{

  "args": {},

  "headers": {

    "Accept": "*/*",

    "Accept-Encoding": "gzip, deflate",

    "Connection": "close",

    "Host": "httpbin.org",

    "User-Agent": "python-requests/2.18.4"

  },

  "origin": "222.94.50.178",

  "url": "http://httpbin.org/get"

}

②带参数的get请求

import requests

data = {

    'name':'python','age':17

}

response = requests.get('http://httpbin.org/get',params=data)

print(response.text)

运行结果：

{

  "args": {

    "age": "",

    "name": "python"

  },

  "headers": {

    "Accept": "*/*",

    "Accept-Encoding": "gzip, deflate",

    "Connection": "close",

    "Host": "httpbin.org",

    "User-Agent": "python-requests/2.18.4"

  },

  "origin": "222.94.50.178",

  "url": "http://httpbin.org/get?name=python&age=17"

}

get和post请求的区别：

GET是从服务器上获取数据，POST是向服务器传送数据
GET请求参数显示，都显示在浏览器网址上，HTTP服务器根据该请求所包含URL中的参数来产生响应内容，即“Get”请求的参数是URL的一部分。例如： http://www.baidu.com/s?wd=Chinese
POST请求参数在请求体当中，消息长度没有限制而且以隐式的方式进行发送，通常用来向HTTP服务器提交量比较大的数据（比如请求中包含许多参数或者文件上传操作等），请求的参数包含在“Content-Type”消息头里，指明该消息体的媒体类型和编码，
注意：避免使用Get方式提交表单，因为有可能会导致安全问题。比如说在登陆表单中用Get方式，用户输入的用户名和密码将在地址栏中暴露无遗。

③解析Json

import requests

import json

response = requests.get('http://httpbin.org/get')

print(response.json())

print(type(response.json()))

④获取二进制数据

# -*-  coding:utf-8 -*-

'''

保存百度图标

'''

import requests

response = requests.get('https://www.baidu.com/img/bd_logo1.png')

with open('baidu.png','wb') as f:

    f.write(response.content)

    f.close()

⑤添加headers
如果直接爬取知乎的网站，是会报错的，如：

import requests

response = requests.get('https://www.zhihu.com/explore')

print(response.text)

运行结果：

<html><body><h1>500 Server Error</h1>

An internal server error occured.

</body></html>

解决办法：

import requests

headers = {

    'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'

}

response = requests.get('https://www.zhihu.com/explore',headers = headers)

print(response.text)

就是添加一个headers,就可以正常抓取,而headers中的数据，我是通过chrome浏览器自带的开发者工具去找了然后copy过来的

基本POST请求

import requests

data = {

    'name':'python','age' : 18

}

headers = {

    'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'

}

response = requests.post('http://httpbin.org/post',data=data,headers=headers)

print(response.json())

实例：爬取拉勾网python职位，并把数据保存为字典

# -*-  coding:utf-8 -*-

import requests

headers = {

    'User-Agent' :'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '

                         'Chrome/69.0.3497.100 Safari/537.36',

    'Referer':"https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput="

}

data = {

    'first':"True",

    'pn':"",

    'kd' :"python"

}

url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'

response = requests.get(url,headers=headers,params=data)

print(response.json())

响应

import requests

'''

response属性

'''

response = requests.get('http://www.baidu.com')

print(response.status_code,type(response.status_code))

print(response.history,type(response.history))

print(response.cookies,type(response.cookies))

print(response.url,type(response.url))

print(response.headers,type(response.headers))

运行结果：

200 <class 'int'>

[] <class 'list'>

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]> <class 'requests.cookies.RequestsCookieJar'>

http://www.baidu.com/ <class 'str'>

{'Server': 'bfe/1.0.8.18', 'Date': 'Thu, 05 Apr 2018 06:27:33 GMT', 'Content-Type': 'text/html', 'Last-Modified': 'Mon, 23 Jan 2017 13:28:24 GMT', 'Transfer-Encoding': 'chunked', 'Connection': 'Keep-Alive', 'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Pragma': 'no-cache', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Content-Encoding': 'gzip'} <class 'requests.structures.CaseInsensitiveDict'>

状态码判断
状态码参考表 http://www.cnblogs.com/wuzhiming/p/8722422.html

# -*-  coding:utf-8 -*-

import requests

response = requests.get('http://www.cnblogs.com/hello.html')

exit() if not response.status_code == requests.codes.not_found else print('404 not found')

response1 = requests.get('http://www.baidu.com')

exit() if not response1.status_code == requests.codes.ok else print('Request Successly')

高级操作
①文件上传

import requests

file = {'file':open('baidu.png','rb')}

response = requests.post('http://httpbin.org/post',files = file)

print(response.text)

运行结果不演示

②获取cookie

import requests

response = requests.get('http://www.baidu.com')

cookies = response.cookies

print(cookies)

for key,value in cookies.items():

    print(key + '=' + value)

③会话维持

import requests

s = requests.Session()

s.get('http://httpbin.org/cookies/get/number/123456789')

response = s.get('http://httpbin.org/cookies')

print(response.text)

④证书验证

import requests

#verify=False表示不进行证书验证

response = requests.get('https://www.12306.cn',verify=False)

print(response.status_code)

手动指定证书

response1 = requests.get('https://www.12306.cn',cert=('/path/server.crt','/path/key'))

⑤代理设置

import requests

#用法示例，代理可以自己百度免费的代理

proxies = {

    'http':'http://127.0.0.1:端口号',

    'https':'https://ip:端口号',

    'http':'http://username:password@ip:端口号'

}

response = requests.get('http://www.baidu.com',proxies=proxies)

print(response.status_code)

⑥超时设置

import requests

response = requests.get('http://httpbin.org/get',timeout = 1)

print(response.status_code)

⑦认证设置

import requests

from requests.auth import HTTPBasicAuth

response = requests.get('http://127.0.0.1:8888',auth=('user','password'))

response1 = requests.get('http://127.0.0.1:8888',auth=HTTPBasicAuth('user','passwrd'))

print(response.status_code)

PS：127.0.0.1:8888只是举例

⑧异常处理

import requests

from requests.exceptions import ReadTimeout,HTTPError,RequestException

try:

    response = requests.get('http://httpbin.org/get',timeout = 0.01)

    print(response.status_code)

except ReadTimeout:

    print("TIME OUT")

except HTTPError:

    print('HTTP ERROR')

except RequestException:

    print("ERROR")

学习总结：

　　通过爬虫的学习可以进一步的掌握python的基础应用，我的目的就是这个，后面继续学习

Python爬虫学习==>第八章：Requests库详解的更多相关文章

python WEB接口自动化测试之requests库详解
由于web接口自动化测试需要用到python的第三方库--requests库,运用requests库可以模拟发送http请求,再结合unittest测试框架,就能完成web接口自动化测试. 所以笔者今 ...
Python爬虫学习笔记-2.Requests库
Requests是Python的一个优雅而简单的HTTP库,它比Pyhton内置的urllib库,更加强大. 0X01 基本使用安装 Requests,只要在你的终端中运行这个简单命令即可: pip ...
python爬虫学习，使用requests库来实现模拟登录4399小游戏网站。
1.首先分析请求,打开4399网站. 右键检查元素或者F12打开开发者工具.然后找到network选项, 这里最好勾选perserve log 选项,用来保存请求日志.这时我们来先用我们的账号密码登陆 ...
Python爬虫利器一之Requests库的用法
前言之前我们用了 urllib 库,这个作为入门的工具还是不错的,对了解一些爬虫的基本理念,掌握爬虫爬取的流程有所帮助.入门之后,我们就需要学习一些更加高级的内容和工具来方便我们的爬取.那么这一节来 ...
python爬虫学习(6) —— 神器 Requests
Requests 是使用 Apache2 Licensed 许可证的 HTTP 库.用 Python 编写,真正的为人类着想. Python 标准库中的 urllib2 模块提供了你所需要的大多数 H ...
(转)Python爬虫利器一之Requests库的用法
官方文档以下内容大多来自于官方文档,本文进行了一些修改和总结.要了解更多可以参考官方文档安装利用 pip 安装 $ pip install requests 或者利用 easy_install ...
Python爬虫：requests 库详解，cookie操作与实战
原文第三方库 requests是基于urllib编写的.比urllib库强大,非常适合爬虫的编写. 安装: pip install requests 简单的爬百度首页的例子: response.te ...
爬虫学习--Requests库详解 Day2
什么是Requests Requests是用python语言编写,基于urllib,采用Apache2 licensed开源协议的HTTP库,它比urllib更加方便,可以节约我们大量的工作,完全满足 ...
python爬虫学习(一)：BeautifulSoup库基础及一般元素提取方法
最近在看爬虫相关的东西,一方面是兴趣,另一方面也是借学习爬虫练习python的使用,推荐一个很好的入门教程:中国大学MOOC的<python网络爬虫与信息提取>,是由北京理工的副教授嵩天老 ...

随机推荐

maven命令创建web骨架项目
maven命令创建web骨架项目有以下两种方式: mvn archetype:create -DgroupId=org.seckill -DartifactId=seckill -Darchetype ...
ios 打包下
一.打包真机方式二.编译打包三.配置打包信息以下为打的包:
Ant自动编译打包android项目（转载）
1.1 Ant安装 ant的安装比较简单,下载ant压缩包 http://ant.apache.org (最新的为1.9.3版本),下载之后将其解压到某个目录(本人解压到E:\Progra ...
利用msyqlfont + plsql 客户端完成msyql数据向oracle的转移
方法一: 1.这是mysqlfont 连接工具 ,选中表右键点击输出->csv文件 2.选择导出的文件为ANSI型,因为csv文件excel打开的默认编码方式为ANSI这样可以防止中文在exc ...
monkey test——学习资料
出处: http://www.testwo.com/blog/6107 http://www.testwo.com/blog/6146 http://www.testwo.com/blog/6188 ...
Java中indexOf的用法
indexOf有四种用法: 1.indexOf(int ch) 在给定字符串中查找字符(ASCII),找到返回字符数组所对应的下标找不到返回-1 2.indexOf(String str)在给定符串中 ...
推荐系统系列（一）：FM理论与实践
背景在推荐领域CTR(click-through rate)预估任务中,最常用到的baseline模型就是LR(Logistic Regression).对数据进行特征工程,构造出大量单特征,编码之 ...
「HNOI2014」世界树
题目链接问题分析首先观察数据范围可以知道要用虚树.但是要考虑怎么维护原树的距离信息. 如果只有两个关键点,我们可以很方便地找到中点将整棵树划分为两部分.而如果有多个关键点,看起来有效的方法就是多源 ...
Codeforces 514 D R2D2 and Droid Army（Trie树）
题目链接大意是判断所给字符串组中是否存在与查询串仅一字符之差的字符串. 关于字符串查询的题,可以用字典树(Trie树)来解,第一次接触,做个小记.在查询时按题目要求进行查询. 代码: #define ...
Vue_(组件通讯)非父子关系组件通信
Vue单项数据流传送门 Vue中不同的组件,即使不存在父子关系也可以相互通信,我们称为非父子关系通信我们需要借助一个空Vue实例,在不同的组件中,使用相同的Vue实例来发送/监听事件,达到数据通信 ...

Python爬虫学习==>第八章：Requests库详解

学习目的：

正式步骤

Step1：什么是requests

Step2：实例 引入

Step3：各种请求方式

学习总结：

Python爬虫学习==>第八章：Requests库详解的更多相关文章

随机推荐

热门专题

Step2：实例引入