Python爬虫系列-Requests库详解

Requests基于urllib，比urllib更加方便，可以节约我们大量的工作，完全满足HTTP测试需求。

实例引入

 import requests

 response = requests.get('https://www.baidu.com/')

 print(type(response))

 print(response.status_code)

 print(type(response.text))

 print(response.cookies)

<class 'requests.models.Response'>

200

<class 'str'>

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

各种请求方式

 import requests

 requests.post('http://httpbin.org/post')

<Response [200]>

 requests.put('http://httpbin.org/put')

<Response [200]>

 requests.delete('http://httpbin.org/delete')

<Response [200]>

 requests.head('http://httpbin.org/gett')

<Response [404]>

 requests.head('http://httpbin.org/get')

<Response [200]>

 requests.options('http://httpbin.org/get')

<Response [200]>

基本GET请求

 import requests

 response = requests.get('http://httpbin.org/get')

 print(response.text)

{

"args": {},

"headers": {

"Accept": "/",

"Accept-Encoding": "gzip, deflate",

"Connection": "close",

"Host": "httpbin.org",

"User-Agent": "python-requests/2.20.1"

},

"origin": "58.34.235.37",

"url": "http://httpbin.org/get"

}

带参数的GET请求

import requests

 response = requests.get('http://httpbin.org/get?name=germey&age=22')

 print(response.text)

{

"args": {

"age": "22",

"name": "germey"

},

"headers": {

"Accept": "/",

"Accept-Encoding": "gzip, deflate",

"Connection": "close",

"Host": "httpbin.org",

"User-Agent": "python-requests/2.20.1"

},

"origin": "58.34.235.37",

"url": "http://httpbin.org/get?name=germey&age=22"

}

import requests

 data = { 'name':'germery','age':22 }

 response = requests.get('http://httpbin.org/get',params=data)

 print(response.text)

{

"args": {

"age": "22",

"name": "germery"

},

"headers": {

"Accept": "/",

"Accept-Encoding": "gzip, deflate",

"Connection": "close",

"Host": "httpbin.org",

"User-Agent": "python-requests/2.20.1"

},

"origin": "58.34.235.37",

"url": "http://httpbin.org/get?name=germery&age=22"

}

解析json

 import requests

 response = requests.get('http://httpbin.org/get')

 print(type(response.text))

<class 'str'>

 print(response.json())  # 与json.loads(response.text)完全一样

{'args': {}, 'headers': {'Accept': '/', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.20.1'}, 'origin': '58.34.235.37', 'url': 'http://httpbin.org/get'}

 print(type(response.json()))

<class 'dict'>

获取二进制数据

import requests

 response = requests.get('https://github.com/favicon.ico')

 print(type(response.text),type(response.content))

<class 'str'> <class 'bytes'>

下载图片

import requests

 response = requests.get('https://github.com/favion.ico')

 with open('favicon.ico','wb') as f:

     f.write(response.content)

添加headers

import requests

 headers = { 'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' }

 response = requests.get('https://www.zhihu.com/explore',headers=headers)

基本POST请求

import requests

data = { 'name':'germey','age':22 }

response = requests.post('http://httpbin.org/post',data=data)

print(response.text)

{

"args": {},

"data": "",

"files": {},

"form": {

"age": "22",

"name": "germey"

},

"headers": {

"Accept": "/",

"Accept-Encoding": "gzip, deflate",

"Connection": "close",

"Content-Length": "18",

"Content-Type": "application/x-www-form-urlencoded",

"Host": "httpbin.org",

"User-Agent": "python-requests/2.20.1"

},

"json": null,

"origin": "58.34.235.37",

"url": "http://httpbin.org/post"

}

添加headers

 import requests

 data = {'name':'germey','age':22}

 headers = { 'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' }

 response = requests.post('http://httpbin.org/post',data=data,headers=headers)

 print(response.json())

{'args': {}, 'data': '', 'files': {}, 'form': {'age': '22', 'name': 'germey'}, 'headers': {'Accept': '/', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Content-Length': '18', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}, 'json': None, 'origin': '58.34.235.37', 'url': 'http://httpbin.org/post'}

响应

response属性

import requests

 response = requests.get('http://www.jianshu.com')

print(type(response.status_code),response.status_code)

<class 'int'> 403

 print(type(response.headers),response.headers)

<class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Tue, 27 Nov 2018 20:03:06 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Server': 'Tengine', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Encoding': 'gzip', 'X-Via': '1.1 dianxinxiazai180:5 (Cdn Cache Server V2.0), 1.1 PSjsntdx3xf38:1 (Cdn Cache Server V2.0)'}

 print(type(response.cookies),response.cookies)

<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[]>

 print(type(response.url),response.url)

<class 'str'> https://www.jianshu.com/

 print(type(response.history),response.history)

<class 'list'> [<Response [301]>]

状态码判断：

 import requests

 response = requests.get('http://www.jianshu.com')

 exit() if not response.status_code==requests.codes.ok else print('Request Successfully')

 import requests

 response = requests.get('http://www.jianshu.com')

 exit() if not response.status_code==200 else print('Request Successfully')

Requests高级操作

文件上传

 import requests

 files = {'files':open('favicon.ico','rb')}

 response = requests.post('http://httpbin.org/post',files=files)

 print(response.text)

{

"args": {},

"data": "",

"files": {

"files": "内容省略"

},

"form": {},

"headers": {

"Accept": "/",

"Accept-Encoding": "gzip, deflate",

"Connection": "close",

"Content-Length": "148",

"Content-Type": "multipart/form-data; boundary=6e864227a6fd1cd7a1655802d20d7bd9",

"Host": "httpbin.org",

"User-Agent": "python-requests/2.20.1"

},

"json": null,

"origin": "58.34.235.37",

"url": "http://httpbin.org/post"

}

获取cookie值

import requests

 response = requests.get('http://www.baidu.com')

 print(response.cookies)

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

 for key,value in response.cookies.items():

     print(key+'='+value)

BDORZ=27315

会话维持模拟登录

import requests

 requests.get('http://httpbin.org/cookies/set/number/123456789')

<Response [200]>

 response = requests.get('http://httpbin.org/cookies')

 print(response.text)

{

"cookies": {}

}

s = requests.Session()

 s.get('http://httpbin.org/cookies/set/number/123456789')

<Response [200]>

 response = s.get('http://httpbin.org/cookies')

 print(response.text)

{

"cookies": {

"number": "123456789"

}

}

证书验证

import requests

 response = requests.get('https://www.12306.cn')

 print(response.status_code)

response = requests.get('https://www.12306.cn',verify=False)

/home/dex/.local/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

InsecureRequestWarning)

/home/dex/.local/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

InsecureRequestWarning)

添加证书

response = requests.get('https://www.12306.cn',cert=('/path/server.crf','/path/key'))

代理设置

 import requests

 proxies = {

   'http':'http://222.74.61.98:53281',

 # 'http':'http://user:password@222.74.61.98:53281'  #添加密码

  'https':'https://114.119.116.92:61066'

  }

 response = requests.get('https://www.taobao.com',proxies=proxies)

 print(response.status_code)

使用socks代理设置

首先安装requests[socks]支持：

 pip3 install 'requests[socks]'

import requests

proxies = {

 'http':'socks5://127.0.0.1:1080',

 'https':'socks5://127.0.0.1:1080'

 }

 response = requests.get('https://www.taobao.com',proxies=proxies)

 print(response.status_code)

200

超时设置与捕获

import requests

from requests.exceptions import ReadTimeout

try:

     response = requests.get("http://www.taobao.com",timeout=0.01)

     print(response.status_code)

 except ReadTimeout:

     print('Timeout')

Timeout

认证设置

import requests

from requests.auth import HTTPBasicAuth

 r = requests.get('http://120.27.34.24:9001',auth=HTTPBasicAuth('user',123))

# r = requests.get('http://120.27.34.24:9001',auth= ('user',123))

异常处理

import requests

 from requests.exceptions import ReadTimeout,ConnectionError,HTTPError,RequestException

 try:

     response = requests.get('http://httpbin.org/get',timeout=0.5)

     print(response.status_code)

 except ReadTimeout:

    print('Timeout')

 except ConnectionError:

    print('Timeout')

 except HTTPError:

     print('Http Error')

 except RequestException:

     print('Error')

200

Python爬虫系列-Requests库详解的更多相关文章

Python爬虫系列-Urllib库详解
Urllib库详解 Python内置的Http请求库: * urllib.request 请求模块 * urllib.error 异常处理模块 * urllib.parse url解析模块 * url ...
Python爬虫：requests 库详解，cookie操作与实战
原文第三方库 requests是基于urllib编写的.比urllib库强大,非常适合爬虫的编写. 安装: pip install requests 简单的爬百度首页的例子: response.te ...
爬虫学习--Requests库详解 Day2
什么是Requests Requests是用python语言编写,基于urllib,采用Apache2 licensed开源协议的HTTP库,它比urllib更加方便,可以节约我们大量的工作,完全满足 ...
Python爬虫学习==>第八章：Requests库详解
学习目的: request库比urllib库使用更加简洁,且更方便. 正式步骤 Step1:什么是requests requests是用Python语言编写,基于urllib,采用Apache2 Li ...
python WEB接口自动化测试之requests库详解
由于web接口自动化测试需要用到python的第三方库--requests库,运用requests库可以模拟发送http请求,再结合unittest测试框架,就能完成web接口自动化测试. 所以笔者今 ...
python接口自动化测试之requests库详解
前言说到python发送HTTP请求进行接口自动化测试,脑子里第一个闪过的可能就是requests库了,当然python有很多模块可以发送HTTP请求,包括原生的模块http.client,urll ...
python爬虫之requests库
在python爬虫中,要想获取url的原网页,就要用到众所周知的强大好用的requests库,在2018年python文档年度总结中,requests库使用率排行第一,接下来就开始简单的使用reque ...
python爬虫利器Selenium使用详解
简介: 用pyhon爬取动态页面时普通的urllib2无法实现,例如下面的京东首页,随着滚动条的下拉会加载新的内容,而urllib2就无法抓取这些内容,此时就需要今天的主角selenium. Sele ...
Python爬虫之requests库介绍(一)
一:Requests: 让 HTTP 服务人类虽然Python的标准库中 urllib2 模块已经包含了平常我们使用的大多数功能,但是它的 API 使用起来让人感觉不太好,而 Requests 自称 ...

随机推荐

微信小程序请求openid错误码40163
通过wx.login({})方法获取到的code只能使用一次,如果需要在哎服务器端再次请求获取openid来进行校验,需要再次通过wx.login({})方法获取code 否则会报错误代码40163, ...
CentOS6.7上安装nginx1.8.0
主题: CentOS6.7上安装nginx1.8.0 环境准备: 1.gcc-c++ 示例:yum install gcc-c++ 安装:gcc-c++ gcc-c++编译工具 2.PCRE(Perl ...
2、CreateJS介绍-TweenJS
需要在html5文件中引入的CreateJS库文件是easeljs-0.7.1.min.js和tweenjs-0.5.1.min.js HTML5文件如下: <!DOCTYPE html> ...
REST API -- 缓存和并发
REST API -- 缓存和并发 https://www.cnblogs.com/cgzl/p/9165388.html 本文所需的一些预备知识可以看这里: http://www.cnblogs.c ...
Nginx托管.Net Core应用程序
Nginx托管.Net Core应用程序一.安装.Net Core 参考官方文档:https://www.microsoft.com/net/core#linuxcentos 1.添加dotnet产 ...
运行node提示：events.js:160 throw er; // Unhandled 'error' event
运行node时遇到下述提示: events.js:160 throw er; // Unhandled 'error' event或者events.js:160 throw er; // ...
如何更改Android的默认虚拟机地址
第一种,虚拟机已经建立 1)找到虚拟机.ini这个文件,例如: zhai.ini 寻找方法:你可以在运行SDK Manager时看到最上面显示的虚拟机存放地址例如显示: List of existi ...
C# 操作 sqlite
1.下载sqlite:http://system.data.sqlite.org/downloads/1.0.94.0/sqlite-netFx20-setup-bundle-x86-2005-1.0 ...
Java中的构造函数——通过示例学习Java编程（14）
作者:CHAITANYA SINGH 来源:https://www.koofun.com//pro/kfpostsdetail?kfpostsid=25 构造函数是用来初始化新创建的对象的代码块. ...
Java学习知识体系大纲梳理
感悟很奇怪,我怎么会想着写这么一篇博客——Java语言的学习体系,这不是大学就已经学过的课程嘛.博主系计算机科班毕业,大学的时候没少捧着Java教程来学习,不管是为了学习编程还是为了期末考个高分,都 ...

Python爬虫系列-Requests库详解

实例引入

各种请求方式

基本GET请求

带参数的GET请求

解析json

获取二进制数据

下载图片

添加headers

基本POST请求

添加headers

响应

response属性

Requests高级操作

文件上传

获取cookie值

会话维持 模拟登录

证书验证

添加证书

代理设置

使用socks代理设置

超时设置与捕获

认证设置

异常处理

Python爬虫系列-Requests库详解的更多相关文章

随机推荐

热门专题

会话维持模拟登录