requests库使用

介绍：

发送HTTP请求的第三方库，比起之前用到的urllib，requests模块的api更加便捷（本质就是封装了urllib3）

安装：pip3 install requests

学习requests前，可以先熟悉下HTTP协议

http://www.cnblogs.com/linhaifeng/p/6266327.html

GET请求：

import requests

from urllib import parse

param={'wd':'中国'}

# 对url进行传参

response = requests.get('http://www.baidu.com/s?', params=param)

print(response.url)

# url解码 ASCII --》utf8

print(parse.unquote(response.url))

>>输出

http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD

http://www.baidu.com/s?wd=中国

GET请求->headers

通常我们在发送请求时都需要带上请求头，请求头是将自身伪装成浏览器的关键

# 添加headers(浏览器会识别请求头,不加可能会被拒绝访问,比如访问https://www.zhihu.com/explore)

import requests

response = requests.get('https://www.zhihu.com/explore')

print(response.status_code) # 返回500错误

# 自己定制headers

headers={

    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36"

}

respone = requests.get('https://www.zhihu.com/explore',headers = headers)

print(respone.status_code) # 返回200

GET请求->cookies

headers = {

        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36",

}

loginUrl = 'https://github.com/login'

# 获取cookies

cookies = response.cookies

print('cookies=>',cookies)

GET请求->代理

import requests

proxies={

    'http':'111.47.220.67:8080',

    'https':'111.47.220.67:8080',

}

response = requests.get('https://www.zhihu.com/explore',

                      proxies= proxies,headers = headers, verify=False)

print(response.status_code)

GET请求->超时设置

import requests

#timeout代表接收数据的超时时间

response = requests.get('https://www.baidu.com', timeout=1)

print(response.status_code)

response

response属性

import requests

response = requests.get('http://www.jianshu.com')

# response属性

print(response.text)  # 文本数据str 经过转码的

print(response.content)  # 原始数据字节串bytes

print(response.status_code)  # 返回状态码 200

print(response.headers)

print(response.cookies)

print(response.cookies.get_dict())

print(response.cookies.items())

print(response.url)

print(response.history)

print(response.encoding)

编码问题

# 编码问题

import requests

response = requests.get('http://www.autohome.com/news')

print(response.headers['Content-Type'])  # 返回text/html

#  如果返回值不包括charset元素，默认返回编码为ISO-8859-1

print(response.encoding)  # 返回ISO-8859-1 按ISO-8859-1方式解码text

response.encoding = 'GBK'  # 汽车之家网站返回的页面内容为gb2312编码的，而requests的默认编码为ISO-8859-1，如果不设置成gbk则中文乱码

print(response.text)

response = requests.get('https://www.jianshu.com')

print(response.headers['Content-Type'])  # 返回text/html; charset=utf-8

#  返回值包括charset元素，返回编码为charset后的编码

print(response.encoding) # 返回utf-8 ,按utf-8方式解码text

print(response.text) # 简书返回的页面内容为utf-8编码的，在这里不用设置response.encoding = 'utf-8'

解析json

# 解析json

import requests

import json

response = requests.get('http://httpbin.org/get')

res1 = json.loads(response.text) # 以往获取方式太麻烦

res2 = response.json()  # 直接获取json数据

print(res2)

print(res1 == res2)  # True

获取二进制数据

import requests

response = requests.get('http://pic-bucket.nosdn.127.net/photo/0005/2018-02-26/DBIGGI954TM10005NOS.jpg')

with open('a.jpg', 'wb') as f:

    f.write(response.content)

# stream参数:一点一点的取,比如下载视频时,如果视频100G,用response.content然后一下子写到文件中是不合理的

response = requests.get('https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo-transcode/1767502_56ec685f9c7ec542eeaf6eac93a65dc7_6fe25cd1347c_3.mp4',

                      stream = True)

with open('b.mp4','wb') as f:

   # 获取二进制流(iter_content)

    for line in response.iter_content():

        f.write(line)

基于POST请求

1、介绍

#GET请求

HTTP默认的请求方法就是GET

     * 没有请求体

     * 数据必须在1K之内！

     * GET请求数据会暴露在浏览器的地址栏中

GET请求常用的操作：

       1. 在浏览器的地址栏中直接给出URL，那么就一定是GET请求

       2. 点击页面上的超链接也一定是GET请求

       3. 提交表单时，表单默认使用GET请求，但可以设置为POST

#POST请求

(1). 数据不会出现在地址栏中

(2). 数据的大小没有上限

(3). 有请求体

(4). 请求体中如果存在中文，会使用URL编码！

#！！！requests.post()用法与requests.get()完全一致，特殊的是requests.post()有一个data参数，用来存放请求体数据

2、发送post请求，登录github

#!/usr/bin/env python

# -*- coding: utf-8 -*-

# @Time    : 2018/2/27 20:42

# @Author  : hyang

# @Site    :

# @File    : request_github.py

# @Software: PyCharm

import re

import requests

import http.cookiejar as cookielib

from requests.packages import urllib3

'''

一 目标站点分析

    浏览器输入https://github.com/login

    然后输入错误的账号密码，通过Fiddle抓包

    发现登录行为是post提交到：https://github.com/session

    而且请求头包含cookie

    而且请求体包含：

        commit:Sign in

        utf8:✓

        authenticity_token:lbI8IJCwGslZS8qJPnof5e7ZkCoSoMn6jmDTsL1r/m06NLyIbw7vCrpwrFAPzHMep3Tmf/TSJVoXWrvDZaVwxQ==

        login:908099665@qq.com

        password:123

二 流程分析

    先GET：https://github.com/login拿到初始cookie与authenticity_token

    返回POST：https://github.com/session， 带上初始cookie，带上请求体（authenticity_token，用户名，密码等）

    最后拿到登录cookie

    ps：如果密码时密文形式，则可以先输错账号，输对密码，然后到浏览器中拿到加密后的密码，github的密码是明文

'''

import ssl

# 解决某些环境下报<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed

ssl._create_default_https_context = ssl._create_unverified_context

urllib3.disable_warnings() # 关闭警告

headers = {

        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36",

}

loginUrl = 'https://github.com/login'

postUrl = 'https://github.com/session'

response = requests.get(loginUrl, headers=headers, verify=False)

# 获取authenticity_token

authenticity_token = re.findall(r'<input name="authenticity_token" type="hidden" value="(.*?)" />', response.text)

# 获取cookies

cookies = response.cookies

print('cookies=>',cookies)

print('authenticity_token=>',authenticity_token)

email='908099665@qq.com'

password='yanghaoXXXX'

post_data={

        "commit":"Sign in",

        "utf8":"✓",

        "authenticity_token":authenticity_token,

        "login":email,

         "password":password,

    }

response2 = requests.post(postUrl, data=post_data, headers=headers, verify=False, cookies=cookies)

print(response2.status_code)

print(response2.history)  # 跳转的历史状态码

print(response2.text)

分析抓包

3. session的使用

#!/usr/bin/env python

# -*- coding: utf-8 -*-

# @Time    : 2018/2/26 23:31

# @Author  : hyang

# @Site    :

# @File    : request-github.py

# @Software: PyCharm

import re

import requests

import urllib3

import http.cookiejar as cookielib

import ssl

# 解决某些环境下报<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed

ssl._create_default_https_context = ssl._create_unverified_context

urllib3.disable_warnings() # 关闭警告

headers = {

        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36",

}

loginUrl = 'https://github.com/login'

postUrl = 'https://github.com/session'

profileUrl = 'https://github.com/settings/emails'

session = requests.session()  # 包括了cookies信息

# 生成 github_cookie文件

session.cookies = cookielib.LWPCookieJar(filename='github_cookie')

# 获取authenticity_token

def get_token():

        response = session.get(loginUrl, headers=headers, verify=False)

        html = response.text

        authenticity_token = re.findall(r'<input name="authenticity_token" type="hidden" value="(.*?)" />', html)

        print(authenticity_token)

        return authenticity_token

# 登陆表单提交参数

def post_account(email, password):

    post_data = {

            'commit': 'Sign in',

            'utf8':'✓',

            'authenticity_token': get_token(),

            'login': email,

            'password': password

        }

    response = session.post(postUrl, data=post_data, headers=headers)

    print(response.status_code)

    # 保存cookies

    session.cookies.save()

def load_cookie():

        try:

           session.cookies.load(ignore_discard=True)

           print('cookie 获取成功')

        except:

            print('cookie 获取不成功')

# 判断是否登陆成功

def isLogin():

        load_cookie()

        response = session.get(profileUrl, headers=headers)

        #print('908099665@qq.com' in response.text)

        return '908099665@qq.com' in response.text

if __name__ == "__main__":

    # 输入自己email账号和密码

    post_account(email='908099665@qq.com', password='yanghaoXXXX')

    # 验证是否登陆成功

    isLogin()

重定向

By default Requests will perform location redirection for all verbs except HEAD.

We can use the history property of the Response object to track redirection.

The Response.history list contains the Response objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response.

For example, GitHub redirects all HTTP requests to HTTPS:

>>> r = requests.get('http://github.com')

>>> r.url

'https://github.com/'

>>> r.status_code

>>> r.history

[<Response [301]>]

If you're using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the allow_redirects parameter:

>>> r = requests.get('http://github.com', allow_redirects=False)

>>> r.status_code

>>> r.history

[]

If you're using HEAD, you can enable redirection as well:

>>> r = requests.head('http://github.com', allow_redirects=True)

>>> r.url

'https://github.com/'

>>> r.history

[<Response [301]>]

先看官网的解释

高级认证

#证书验证(大部分网站都是https)

import requests

respone=requests.get('https://www.12306.cn') #如果是ssl请求,首先检查证书是否合法,不合法则报错

#改进1:去掉报错,但是会报警告

import requests

respone=requests.get('https://www.12306.cn',verify=False) #不验证证书,报警告,返回200

print(respone.status_code)

#改进2:去掉报错,并且去掉警报信息

import requests

from requests.packages import urllib3

urllib3.disable_warnings() #关闭警告

respone=requests.get('https://www.12306.cn',verify=False)

print(respone.status_code)

#改进3:加上证书

#很多网站都是https,但是不用证书也可以访问,大多数情况都是可以携带也可以不携带证书

#知乎\百度等都是可带可不带

#有硬性要求的,则必须带，比如对于定向的用户,拿到证书后才有权限访问某个特定网站

import requests

respone=requests.get('https://www.12306.cn',

                     cert=('/path/server.crt',

                           '/path/key'))

print(respone.status_code)

文件上传

import requests

files = {'file':open('a.pptx','rb')}

respone = requests.post('http://httpbin.org/post',files=files)

print(respone.status_code)

异常处理

# 异常处理

import requests

from requests.exceptions import * # 可以查看requests.exceptions获取异常类型

try:

    r = requests.get('http://www.baiduxxx.com', timeout=1)

except ReadTimeout:

    print('ReadTimeout')

except ConnectionError: # 网络不通

    print('ConnectionError')

# except Timeout:

#     print('aaaaa')

except RequestException:

    print('Error')

requests库使用的更多相关文章

Python爬虫小白入门（二）requests库
一.前言为什么要先说Requests库呢,因为这是个功能很强大的网络请求库,可以实现跟浏览器一样发送各种HTTP请求来获取网站的数据.网络上的模块.库.包指的都是同一种东西,所以后文中可能会在不同地 ...
Requests库上传文件时UnicodeDecodeError: 'ascii' codec can't decode byte错误解析
在使用Request上传文件的时候碰到如下错误提示: 2013-12-20 20:51:09,235 __main__ ERROR 'ascii' codec can't decode byte 0x ...
Requests库的几种请求 - 通过API操作Github
本文内容来源:https://www.dataquest.io/mission/117/working-with-apis 本文的数据来源:https://en.wikipedia.org/wiki/ ...
python脚本实例002－利用requests库实现应用登录
#! /usr/bin/python # coding:utf-8 #导入requests库 import requests #获取会话 s = requests.session() #创建登录数据 ...
大概看了一天python request源码。写下python requests库发送 get,post请求大概过程。
python requests库发送请求时,比如get请求,大概过程. 一.发起get请求过程:调用requests.get(url,**kwargs)-->request('get', url ...
python WEB接口自动化测试之requests库详解
由于web接口自动化测试需要用到python的第三方库--requests库,运用requests库可以模拟发送http请求,再结合unittest测试框架,就能完成web接口自动化测试. 所以笔者今 ...
python爬虫从入门到放弃（四）之 Requests库的基本使用
什么是Requests Requests是用python语言基于urllib编写的,采用的是Apache2 Licensed开源协议的HTTP库如果你看过上篇文章关于urllib库的使用,你会发现,其 ...
(转)Python爬虫利器一之Requests库的用法
官方文档以下内容大多来自于官方文档,本文进行了一些修改和总结.要了解更多可以参考官方文档安装利用 pip 安装 $ pip install requests 或者利用 easy_install ...
python requests库学习笔记（上）
尊重博客园原创精神,请勿转载! requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.pytho ...
使用Python的requests库进行接口测试——session对象的妙用
from:http://blog.csdn.net/liuchunming033/article/details/48131051 在进行接口测试的时候,我们会调用多个接口发出多个请求,在这些请求中有 ...

随机推荐

ThinkPHP删除栏目（单）
当我们做一些网站项目的时候,都会遇到这样一类问题,删除一个栏目,而这个栏目又不是最底层栏目,也就是说,被删除的栏目拥有子栏目,这时,我们执行删除该栏目的命令,就需要将该栏目及其子栏目一并删除,因为我们 ...
Ubuntu的Java环境变量
新架构要上线了,这两天开始准备分析一下了,今天是直接进到JAVA_HOME的lib目录执行的java -cp sa-jdi.jar sun.jvm.hotspot.HSDB,然后报了个错: 这是哪来的 ...
python matplotlib 播放图片变化过程
最近想将原图片和处理后的图片放在一起观察图片的变化过程.但是网上并么有找到有用的示例代码,所以粘出来和大家分享一下. import numpy as np import matplotlib.pypl ...
qt的编译
cp qt-everywhere-opensource-src-5.5.0.tar.gz /opt/qt/2.1 解压qt源码 sudo tar xzf qt-everywhere-opensourc ...
Jenkins 不同角色不同视图及不同权限设置
由于jenkins默认的权限管理体系不支持用户组或角色的配置,因此需要安装第三发插件来支持角色的配置,本文将使用Role Strategy Plugin,介绍页面:https://wiki.jenki ...
阿里云CentOS使用iptables禁止某IP访问
在CentOS下封停IP,有封杀网段和封杀单个IP两种形式.一般来说,现在的攻击者不会使用一个网段的IP来攻击(太招摇了),IP一般都是散列的.于是下面就详细说明一下封杀单个IP的命令,和解封单个IP ...
zabbix-agent 启动不起来
遇到一个问题 zabbix-agent 一直启动不起来查看Zabbix Agent日志文件才究其原因. tail /var/log/zabbix/zabbix_agentd.logzabbix_a ...
git添加本地仓库与远程仓库连接
在本地建立一个文件夹,需要与远程git仓库进行连接,具体方法: <1>首先进入所在文件目录执行: git init 初始化git,紧接着 git add . git commit -m ...
定时执行 Job - 每天5分钟玩转 Docker 容器技术（135）
Linux 中有 cron 程序定时执行任务,Kubernetes 的 CronJob 提供了类似的功能,可以定时执行 Job.CronJob 配置文件示例如下: ① batch/v2alpha1 是 ...
35 个 jQuery 小技巧
1. 禁止右键点击 $(document).ready(function(){ $(document).bind("contextmenu",function(e){ return ...

requests库使用

介绍：

GET请求：

GET请求->headers

GET请求->cookies

GET请求->代理

GET请求->超时设置

response

response属性

编码问题

解析json

获取二进制数据

基于POST请求

1、介绍

2、发送post请求，登录github

3. session的使用

重定向

高级认证

文件上传

异常处理

requests库使用的更多相关文章

随机推荐

热门专题