python爬虫知识点总结(四)Requests库的基本使用
官方文档:http://docs.python-requests.org/en/master
安装方法
命令行下输入:pip3 install requests。详见:https://www.cnblogs.com/cthon/p/9388304.html
一、什么是Requets?

requets
实例引入
import requests
response = requests.get('https://www.baidu.com')
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.text)
print(response.cookies)
各种请求方式
import requests
requests.post('http://httpbin.org/post')
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.get('http://httpbin.org/get')
requests.options('http://httpbin.org/get')
请求
基本GET请求
基本写法
import requests
response = requests.get('http://httpbin.org/get')
print(response.text)
带参数GET请求
import requests
response = requests.get('http://httpbin.org/get?name=jack&age=22')
print(response.text)
import requests
data = {
'name':'jack',
'age':22
}
response = requests.get('http://httpbin.org/get',params=data)
print(response.text)
解析json
import requests
import json response = requests.get('https://github.com/get')
print(type(response.text))
print(response.json())
print(json.loads(response.text))
print(type(response.json()))
获取二进制数据
import requests
response = requests.get('https://github.com/favicon.ico')
print(type(response.text),type(response.content))
print(response.text)
print(response.content)
import requests
response = requests.get('https://www.bilibili.com/video/av24028845/?p=9')
with open('q.avi','wb') as f:
f.write(response.content)
f.close()
添加headers
import requests
response = requests.get('https://zhihu.com/explore')
print(response.text)
import requests
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36'}
response = requests.get('https://www.zhihu.com/explore',headers=headers)
print(response.text)
基本POST请求
import requests
data = {'name':'jack','age':'22'}
response = requests.post('https://httpbin.org/post',data=data)
print(response.text)
print(response.json())
import requests
data = {'name':'jack','age':'22'}
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36'}
response = requests.post('https://httpbin.org/post',data=data,headers=headers)
print(response.text)
print(response.json())
响应
response属性
import requests
response = requests.get('http://www.jianshu.com')
print(type(response.status_code),response.status_code)
print(type(response.headers),response.headers)
print(type(response.cookies),response.cookies)
print(type(response.url),response.url)
print(type(response.history),response.history)
状态码判断
import requests
response = requests.get('http://www.cnblogs.com/cthon/p/9383778.html')
exit() if not response.status_code == requests.codes.not_found else print('404 Not Found')
import requests
response = requests.get('http://www.cnblogs.com/cthon/p/9383778.html')
exit() if not response.status_code == 200 else print('Request Successfully')
状态码
100:('continue',),
101:('switching_protocols',),
102:('processing',),
103:('checkpoint',),
122:('url_too_long','request_url_too_long'),
200:('ok','okay','all_ok','all_okay','all_good','\\o/','√',),
201:('created',),
202:('accepted',),
203:('non_authoritative_info','non_authoritative_information'),
204:('no_content',),
205:('reset_content','reset',),
206:('partial_content','partial'),
207:('multi_status','multiple_status','multi_stati','multiple_status'),
208:('already_reported',),
226:('im_used',),
#Redirection
300:('multiple_choices',),
301:('moved_permanently','moved','\\o-'),
302:('found',),
303:('see_other','other'),
304:('not_modified',),
305:('use_proxy',),
306:('switch_proxy',),
307:('temporary_redirect','temporary_moved','temporary'),
308:('permanent_redirect','temporary_moved','temporary',),#There 2 to be removed in 3.0
#Client Error
400:('bad_request','bad'),
401:('unauthorized',),
402:('payment_required','payment'),
403:('forbidden',),
404:('not_found','-o-'),
405:('method_not_allowed','not_allowed'),
406:('not_acceptable',),
407:('proxy_authentication_required','proxy_auth','proxy_authentication'),
408:('request_timeout','timeout'),
409:('confict',),
410('gone',),
411:('length_required',),
412:('precondition_failed','precondition'),
413:('request_entity_too_large',),
414:('request_url_too_large',),
415:('unsupported_media_type','unsupported_media','media_type'),
416:('requested_range_not_satisfiable','requestd_range','range_not_satisfiable'),
417:('expectation_request',),
418:('im_a_teapot','teapot','i_am_a_teapot'),
421:('misdirected_request',),
422:('unprocessable_entity','unprocessable'),
423:('locked',),
424:('failed_dependency','dependency'),
425:('unordered_collection','unordered'),
426:('upgrade_required','upgrade'),
428:('precondition_required','precondition'),
429:('too_many_requests','too_many'),
431:('header_fields_too_large','fields_too_large'),
444:('no_response','none'),
449:('retry_with','retry'),
450:('blocked_by_windows_parental_controls','parental_controls'),
451:('unavailable_for_legal_reasons','legal_reasons'),
499:('client_closed_request',),
#Server Error
500:('internal_server_error','server_error','/o\\','×'),
501:('not_implemented',),
502:('bad_gateway',),
503:('service_unavailable','unavailable'),
504:('gateway_timeout',),
505:('http_version_not_supported','http_version'),
506:('variant_also_negotiaes',),
507:('insufficient_storage',),
509:('bandwidth_limit_exceeded','bandwidth'),
510:('not_extended',),
511:('network_aurhentication_required','network_auth','network_authentication'),
高级文件操作
import requests
files= {'file':open('favicon.ico','rb')}
response = requests.post('http://httpbin.org/post',files=files)
print(response.text)
获取Cookie
import requests
response = requests.get('http://www.baidu.com')
print(response.cookies)
for key,value in response.cookies.items():
print(key+'='+value)
会话维持
import requests
requests.get('http://httpbin.org/cookies/set/number/123456789')
response=requests.get('http://httpbin.org/cookies')
print(response.text)
import requests s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
response=s.get('http://httpbin.org/cookies')
print(response.text)
证书验证
#12306错误证书,请求失败
import requests response = requests.get('https://www.12306.cn/')
print(response.status_code)
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get('https://www.12306.cn',verify = False)
print(response.status_code)
import requests
reeponse = requests.get('https://www.12306.cn',cer=('/path/server.crt','/path/key'))
print(response.status_code)
代理设置
http代理
import requests
proxies = {
'http':'http://127.0.0.1:9743',
'https':'https://127.0.0.1:9743'
}
response = requests.get('https://www.taobao.com',proxies=proxies)
print(response.status_code)
import requests
proxies = {
'http':'http:/user:password@/127.0.0.1:9743'
}
response = requests.get('https://www.taobao.com',proxies=proxies)
print(response.status_code)
socket代理
import requests
proxies = {
'http':'socks5://127.0..0.1.9742',
'https':'socks5://127.0.0.1:9742'
}
response = requests.get('https://www.taobao.com',proxies=proxies)
print(response.status_code)
超时设置
import requests
from requests.exceptions import ReadTimeout
try:
response = requests.get('http://www.baidu.com',timeout = 0.01)
print(response.status_code)
except ReadTimeout:
print('Timeout')
认证设置
import requests
from requests.auth import HTTPBasicAuth r = requests.get('http://120.27.34.24:9001',auth=HTTPBasicAuth('user','123'))
print(r.status_code)
import requests
r = requests.get('http://120.27.34.24:9001',auth=('user','123'))
print(r.status_code)
异常处理
import requests
from requests.exceptions import ReadTimeout,HTTPError,RequestException try:
response = requests.get('http://www.baidu.com',timeout=0.1)
print(response.status_code)
except ReadTimeout:
print('Timeout')
except HTTPError:
print('Http error')
except ConnectionError:
print('Connection Error')
except RequestException:
print('Error')
python爬虫知识点总结(四)Requests库的基本使用的更多相关文章
- (转)Python爬虫利器一之Requests库的用法
官方文档 以下内容大多来自于官方文档,本文进行了一些修改和总结.要了解更多可以参考 官方文档 安装 利用 pip 安装 $ pip install requests 或者利用 easy_install ...
- Python爬虫利器一之Requests库的用法
前言 之前我们用了 urllib 库,这个作为入门的工具还是不错的,对了解一些爬虫的基本理念,掌握爬虫爬取的流程有所帮助.入门之后,我们就需要学习一些更加高级的内容和工具来方便我们的爬取.那么这一节来 ...
- python爬虫入门三:requests库
urllib库在很多时候都比较繁琐,比如处理Cookies.因此,我们选择学习另一个更为简单易用的HTTP库:Requests. requests官方文档 1. 什么是Requests Request ...
- python爬虫(八) requests库之 get请求
requests库比urllib库更加方便,包含了很多功能. 1.在使用之前需要先安装pip,在pycharm中打开: 写入pip install requests命令,即可下载 在github中有关 ...
- python爬虫(6)--Requests库的用法
1.安装 利用pip来安装reques库,进入pip的下载位置,打开cmd,默认地址为 C:\Python27\Scripts 可以看到文件中有pip.exe,直接在上面输入cmd回车,进入命令行界面 ...
- 9.Python爬虫利器一之Requests库的用法(一)
requests 官方文档: http://cn.python-requests.org/zh_CN/latest/user/quickstart.html request 是一个第三方的HTTP库 ...
- Python爬虫学习笔记-2.Requests库
Requests是Python的一个优雅而简单的HTTP库,它比Pyhton内置的urllib库,更加强大. 0X01 基本使用 安装 Requests,只要在你的终端中运行这个简单命令即可: pip ...
- python爬虫(九) requests库之post请求
1.方法: response=requests.post("https://www.baidu.com/s",data=data) 2.拉勾网职位信息获取 因为拉勾网设置了反爬虫机 ...
- python爬虫学习,使用requests库来实现模拟登录4399小游戏网站。
1.首先分析请求,打开4399网站. 右键检查元素或者F12打开开发者工具.然后找到network选项, 这里最好勾选perserve log 选项,用来保存请求日志.这时我们来先用我们的账号密码登陆 ...
- python爬虫学习(6) —— 神器 Requests
Requests 是使用 Apache2 Licensed 许可证的 HTTP 库.用 Python 编写,真正的为人类着想. Python 标准库中的 urllib2 模块提供了你所需要的大多数 H ...
随机推荐
- Python中optparse模块使用浅析
转载:http://www.jb51.net/article/59296.htm 最近遇到一个问题,是指定参数来运行某个特定的进程,这很类似Linux中一些命令的参数了,比如ls -a,为什么加上-a ...
- Android 属性动画框架 ObjectAnimator、ValueAnimator ,这一篇就够了
前言 我们都知道 Android 自带了 Roate Scale Translate Alpha 多种框架动画,我们可以通过她们实现丰富的动画效果,但是这些宽家动画却有一个致命的弱点,它们只是改变了 ...
- Perl语言学习笔记 15 智能匹配与give-when结构
1.智能匹配操作符 替代绑定操作符: 在哈希中查找某一个键: 比較两个数组是否全然同样: 查找列表中是否存在某个元素: 智能匹配操作符与顺序无关.~~ 左右元素能够互换 2.智能操作符优先级 3.gi ...
- Java Enum的使用
最近为了便于对状态码的描述信息进行解析,学习了一下Enum的使用,发现还挺好使的. 首先,定义一个Enum的类Status,有两个属性statusValue状态码 以及 statusDesc状态描述 ...
- Docker入门系列2 安装
可以从 Docker 社区直接下载可用的模版或镜像. Docker容器的启动可以在秒级实现,这相比传统的虚拟机方式要快得多. 其次,Docker对系统资源的利用率很高,一台主机上可以同时运行数千个Do ...
- 嵌入式开发之davinci--- 8148/8168/8127 中的xdc 简介
XDC是TI公司为嵌入式实时系统可重用软件组件(在XDC里被成为packages,以下成为包)制定的一套标准.它包括一些有用的工具,标准的API函数,静态配置文件和打包(packaging)操作.XD ...
- <LeetCode OJ> 121. /122. Best Time to Buy and Sell Stock(I / II)
Say you have an array for which the ith element is the price of a given stock on day i. If you were ...
- 【BZOJ1969】[Ahoi2005]LANE 航线规划 离线+树链剖分+线段树
[BZOJ1969][Ahoi2005]LANE 航线规划 Description 对Samuel星球的探险已经取得了非常巨大的成就,于是科学家们将目光投向了Samuel星球所在的星系——一个巨大的由 ...
- Linq的优缺点
优点: 1.Linq提供了不同数据源的抽象层,所以可以使用相同的语法访问不同的数据源(只要该数据源有提供程序即可) 2.Linq为底层的数据存储提供了一个强类型化的界面,可以把底层的数据作为对象来访问 ...
- ArcGIS api for js OverviewMap(鹰眼/概览图)
说明.本篇博客中主要介绍 地图显示在某个div情况 1.运行效果 2.HTML <!DOCTYPE html> <html> <head> <meta htt ...