Python爬虫系列-Requests库详解
Requests基于urllib,比urllib更加方便,可以节约我们大量的工作,完全满足HTTP测试需求。
实例引入
import requests
response = requests.get('https://www.baidu.com/')
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.cookies)
<class 'requests.models.Response'>
200
<class 'str'>
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
各种请求方式
import requests
requests.post('http://httpbin.org/post')
<Response [200]>
requests.put('http://httpbin.org/put')
<Response [200]>
requests.delete('http://httpbin.org/delete')
<Response [200]>
requests.head('http://httpbin.org/gett')
<Response [404]>
requests.head('http://httpbin.org/get')
<Response [200]>
requests.options('http://httpbin.org/get')
<Response [200]>
基本GET请求
import requests
response = requests.get('http://httpbin.org/get')
print(response.text)
{
"args": {},
"headers": {
"Accept": "/",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.20.1"
},
"origin": "58.34.235.37",
"url": "http://httpbin.org/get"
}
带参数的GET请求
import requests
response = requests.get('http://httpbin.org/get?name=germey&age=22')
print(response.text)
{
"args": {
"age": "22",
"name": "germey"
},
"headers": {
"Accept": "/",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.20.1"
},
"origin": "58.34.235.37",
"url": "http://httpbin.org/get?name=germey&age=22"
}
import requests
data = { 'name':'germery','age':22 }
response = requests.get('http://httpbin.org/get',params=data)
print(response.text)
{
"args": {
"age": "22",
"name": "germery"
},
"headers": {
"Accept": "/",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.20.1"
},
"origin": "58.34.235.37",
"url": "http://httpbin.org/get?name=germery&age=22"
}
解析json
import requests
response = requests.get('http://httpbin.org/get')
print(type(response.text))
<class 'str'>
print(response.json()) # 与json.loads(response.text)完全一样
{'args': {}, 'headers': {'Accept': '/', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.20.1'}, 'origin': '58.34.235.37', 'url': 'http://httpbin.org/get'}
print(type(response.json()))
<class 'dict'>
获取二进制数据
import requests
response = requests.get('https://github.com/favicon.ico')
print(type(response.text),type(response.content))
<class 'str'> <class 'bytes'>
下载图片
import requests
response = requests.get('https://github.com/favion.ico')
with open('favicon.ico','wb') as f:
f.write(response.content)
添加headers
import requests
headers = { 'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' }
response = requests.get('https://www.zhihu.com/explore',headers=headers)
基本POST请求
import requests
data = { 'name':'germey','age':22 }
response = requests.post('http://httpbin.org/post',data=data)
print(response.text)
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "22",
"name": "germey"
},
"headers": {
"Accept": "/",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Content-Length": "18",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.20.1"
},
"json": null,
"origin": "58.34.235.37",
"url": "http://httpbin.org/post"
}
添加headers
import requests
data = {'name':'germey','age':22}
headers = { 'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' }
response = requests.post('http://httpbin.org/post',data=data,headers=headers)
print(response.json())
{'args': {}, 'data': '', 'files': {}, 'form': {'age': '22', 'name': 'germey'}, 'headers': {'Accept': '/', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Content-Length': '18', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}, 'json': None, 'origin': '58.34.235.37', 'url': 'http://httpbin.org/post'}
响应
response属性
import requests
response = requests.get('http://www.jianshu.com')
print(type(response.status_code),response.status_code)
<class 'int'> 403
print(type(response.headers),response.headers)
<class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Tue, 27 Nov 2018 20:03:06 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Server': 'Tengine', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Encoding': 'gzip', 'X-Via': '1.1 dianxinxiazai180:5 (Cdn Cache Server V2.0), 1.1 PSjsntdx3xf38:1 (Cdn Cache Server V2.0)'}
print(type(response.cookies),response.cookies)
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[]>
print(type(response.url),response.url)
<class 'str'> https://www.jianshu.com/
print(type(response.history),response.history)
<class 'list'> [<Response [301]>]
状态码判断:
import requests
response = requests.get('http://www.jianshu.com')
exit() if not response.status_code==requests.codes.ok else print('Request Successfully')
import requests
response = requests.get('http://www.jianshu.com')
exit() if not response.status_code==200 else print('Request Successfully')
Requests高级操作
文件上传
import requests
files = {'files':open('favicon.ico','rb')}
response = requests.post('http://httpbin.org/post',files=files)
print(response.text)
{
"args": {},
"data": "",
"files": {
"files": "内容省略"
},
"form": {},
"headers": {
"Accept": "/",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Content-Length": "148",
"Content-Type": "multipart/form-data; boundary=6e864227a6fd1cd7a1655802d20d7bd9",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.20.1"
},
"json": null,
"origin": "58.34.235.37",
"url": "http://httpbin.org/post"
}
获取cookie值
import requests
response = requests.get('http://www.baidu.com')
print(response.cookies)
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
for key,value in response.cookies.items():
print(key+'='+value)
BDORZ=27315
会话维持 模拟登录
import requests
requests.get('http://httpbin.org/cookies/set/number/123456789')
<Response [200]>
response = requests.get('http://httpbin.org/cookies')
print(response.text)
{
"cookies": {}
}
s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
<Response [200]>
response = s.get('http://httpbin.org/cookies')
print(response.text)
{
"cookies": {
"number": "123456789"
}
}
证书验证
import requests
response = requests.get('https://www.12306.cn')
print(response.status_code)
response = requests.get('https://www.12306.cn',verify=False)
/home/dex/.local/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
/home/dex/.local/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
添加证书
response = requests.get('https://www.12306.cn',cert=('/path/server.crf','/path/key'))
代理设置
import requests
proxies = {
'http':'http://222.74.61.98:53281',
# 'http':'http://user:password@222.74.61.98:53281' #添加密码
'https':'https://114.119.116.92:61066'
}
response = requests.get('https://www.taobao.com',proxies=proxies)
print(response.status_code)
使用socks代理设置
首先安装requests[socks]支持:
pip3 install 'requests[socks]'
import requests
proxies = {
'http':'socks5://127.0.0.1:1080',
'https':'socks5://127.0.0.1:1080'
}
response = requests.get('https://www.taobao.com',proxies=proxies)
print(response.status_code)
200
超时设置与捕获
import requests
from requests.exceptions import ReadTimeout
try:
response = requests.get("http://www.taobao.com",timeout=0.01)
print(response.status_code)
except ReadTimeout:
print('Timeout')
Timeout
认证设置
import requests
from requests.auth import HTTPBasicAuth
r = requests.get('http://120.27.34.24:9001',auth=HTTPBasicAuth('user',123))
# r = requests.get('http://120.27.34.24:9001',auth= ('user',123))
异常处理
import requests
from requests.exceptions import ReadTimeout,ConnectionError,HTTPError,RequestException
try:
response = requests.get('http://httpbin.org/get',timeout=0.5)
print(response.status_code)
except ReadTimeout:
print('Timeout')
except ConnectionError:
print('Timeout')
except HTTPError:
print('Http Error')
except RequestException:
print('Error')
200
Python爬虫系列-Requests库详解的更多相关文章
- Python爬虫系列-Urllib库详解
Urllib库详解 Python内置的Http请求库: * urllib.request 请求模块 * urllib.error 异常处理模块 * urllib.parse url解析模块 * url ...
- Python爬虫:requests 库详解,cookie操作与实战
原文 第三方库 requests是基于urllib编写的.比urllib库强大,非常适合爬虫的编写. 安装: pip install requests 简单的爬百度首页的例子: response.te ...
- 爬虫学习--Requests库详解 Day2
什么是Requests Requests是用python语言编写,基于urllib,采用Apache2 licensed开源协议的HTTP库,它比urllib更加方便,可以节约我们大量的工作,完全满足 ...
- Python爬虫学习==>第八章:Requests库详解
学习目的: request库比urllib库使用更加简洁,且更方便. 正式步骤 Step1:什么是requests requests是用Python语言编写,基于urllib,采用Apache2 Li ...
- python WEB接口自动化测试之requests库详解
由于web接口自动化测试需要用到python的第三方库--requests库,运用requests库可以模拟发送http请求,再结合unittest测试框架,就能完成web接口自动化测试. 所以笔者今 ...
- python接口自动化测试之requests库详解
前言 说到python发送HTTP请求进行接口自动化测试,脑子里第一个闪过的可能就是requests库了,当然python有很多模块可以发送HTTP请求,包括原生的模块http.client,urll ...
- python爬虫之requests库
在python爬虫中,要想获取url的原网页,就要用到众所周知的强大好用的requests库,在2018年python文档年度总结中,requests库使用率排行第一,接下来就开始简单的使用reque ...
- python爬虫利器Selenium使用详解
简介: 用pyhon爬取动态页面时普通的urllib2无法实现,例如下面的京东首页,随着滚动条的下拉会加载新的内容,而urllib2就无法抓取这些内容,此时就需要今天的主角selenium. Sele ...
- Python爬虫之requests库介绍(一)
一:Requests: 让 HTTP 服务人类 虽然Python的标准库中 urllib2 模块已经包含了平常我们使用的大多数功能,但是它的 API 使用起来让人感觉不太好,而 Requests 自称 ...
随机推荐
- CC10:访问单个节点的删除
题目 实现一个算法,删除单向链表中间的某个结点,假定你只能访问该结点. 给定待删除的节点,请执行删除操作,若该节点为尾节点,返回false,否则返回true 解法 这道题并不难,主要是题目中这句话确定 ...
- NET Core2
NET Core的介绍 .NET Core 是一个通用开发平台,它由微软和开源社区共同管理(git hub的.NET开源社区): 他支持Windows,macOS和Linux,并且可以运行在硬件设 ...
- CSS——制作天天生鲜主页
终于做好了! index.html: <!DOCTYPE html> <html lang="en"> <head> <meta char ...
- 举例实用详解sc.textFile()和wholeTextFiles()
谈清楚区别,说明白道理,从案例开始: 1 数据准备 用hdfs存放数据,且结合的hue上传准备的数据,我的hue截图: 每个文件下的数据: 以上是3个文件的数据,每一行用英文下的空格隔开: 2 测试 ...
- windows 安装 jdk1.8并配置环境变量
1.查看电脑环境 我的电脑--右键--属性 2.下载jdk1.8 网址:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-do ...
- 详解Java构造方法为什么不能覆盖,我的钻牛角尖病又犯了....
一 看Think in Java,遇到个程序 class Egg2 { protected class Yolk { public Yolk() { System.out.println(" ...
- MoinMoin install in apache (win)
一:下载环境 xampp:http://sourceforge.net/projects/xampp/files/XAMPP%20Windows/1.8.1/xampp-win32-1.8.1-VC9 ...
- 【Java】 hashcode()和System.identityHashCode()
hashcode()和System.identityHashCode() openjdk8: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/5b86f ...
- arcgis jsapi接口入门系列(9):可以同时显示多个的地图popup
jsapi有提供popup功能,但缺点很多,例如地图上只能同时显示一个popup,popup内容有限制等 本文提供另一个方法,原理不用jsapi,在地图外用一个普通的div放在地图上面,再监听地图的鼠 ...
- 数据库操作----找了MySQL和SQL Sever两个的基础语句
这是MySQL的基本操作: 1 登入数据库:mysql -uroot -p+密码 (SQL Sever登入: osql -U 用户名 -P 密码) 显示已存在的数据库:show databases; ...