人性化的Requests模块(响应与编码、header处理、cookie处理、重定向与历史记录、代理设置)

Requests库是第三方模块，需要额外进行安装。Requests是一个开源库

pip install requests
去GitHub下载回来，进入解压文件，运行setup.py

比urllib2实现方式的代码量少，下面是POST请求：

import requests

postdata= {'key':'value'}

r = requests.post('http://www.cnblogs.com/login',data=postdata)

print(r.content)

下面是get请求，但有些get请求url包含参数，如：www.xxx.com?keyword=bolg;guguobao&pageindex=1,怎么简化url，requests提供其他方法：

payload = {'opt':1}

r = requests.get('https://i.cnblogs.com/EditPosts.aspx',params=payload)

print r.url

响应与编码

import requests

r = requests.get('http://www.baidu.com')

print 'content -- >'+ r.content

print 'text -- >'+ r.text

print 'encoding -- >'+ r.encoding

r.encoding='utf-8'

print 'new text-- >'+r.text

uploading-image-85581.png

r.content 返回是字节，text返回文本形式

如果输出结果为encoding -->encoding -- >ISO-8859-1，则说明实际的编码格式是UTF-8，由于Requests猜测错误，导致解析文本出现乱码。Requests提供解决方案，可以自行设置编码格式，r.encoding='utf-8'设置成UTF-8之后，“new text -->”就不会出现乱码。但这种方法笨拙。因此就有了：chardet,优秀的字符串/文件编码检测模块。
安装pip install chardet
安装完成后，使用chardet.detect()返回字典，其中confidence是检测精确度，encoding是编码方式

import requests,chardet

r = requests.get('http://www.baidu.com')

print chardet.detect(r.content)

r.encoding = chardet.detect(r.content)['encoding']

print(r.text)

运行结果

C:\Python27\python.exe F:/python_scrapy/ch03/3.2.3_3.py

{'confidence': 0.99, 'language': '', 'encoding': 'utf-8'}

<!DOCTYPE html>

<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下，你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div ..

.................

Process finished with exit code 0

3 请求头header处理

Requests对headers的处理和urllib2非常相似，在Requests的get函数添加headers参数

import requests

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

headers={'User-Agent':user_agent}

r = requests.get('http://www.baidu.com',headers=headers)

print r.content

响应码code和响应头header处理

import requests

r = requests.get('http://www.baidu.com')

if r.status_code == requests.codes.ok:

    print r.status_code#响应码

    print r.headers#响应头

    print r.headers.get('content-type')#推荐使用这种获取方式，获取其中的某个字段

    print r.headers['content-type']#不推荐使用这种获取方式,因为不存在会抛出异常

else:

    r.raise_for_status()

cookie处理

如果响应包含cookie的值,可以如下方式取出：

import requests

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

headers={'User-Agent':user_agent}

r = requests.get('http://www.baidu.com',headers=headers)

#遍历出所有的cookie字段的值

for cookie in r.cookies.keys():

    print cookie+':'+r.cookies.get(cookie)

如果想自定义cookie发出去，使用以下方式：

import requests

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

headers={'User-Agent':user_agent}

cookies = dict(name='guguobao',age='10')

r = requests.get('http://www.baidu.com',headers=headers,cookies=cookies)

print r.text

还有一种更加高级，且能自动处理Cookie的方式，有时候我们不需要关心Cookie值是多少，只是希望每次访问的时候，程序都会自动把cookie带上。Requests提供一个session的概念，在连续登录网页，处理登录跳转特别方便，不需要关注具体细节

import requests

loginUrl= 'http://www.xxx.com/login'

s = requests.Session()

#首先访问登录界面，作为游客，服务器会先分配一个cookie

r = s.get(loginUrl,allow_redirects=True)

datas={'name':'guguobao','passwd':'guguobao'}

#向登录链接发送post请求，验证成功，游客权限转为会员权限

r =s.post(loginUrl,data=datas,allow_redirects=True)

print r.text

这种使用Session函数处理Cookie的方式很常见

重定向与历史记录

r =requests.get('http://www.baidu.com/',allow_redirects=True)，将allow_redirects设置为True，允许重定向，FALSE不允许。可以通过r.hisory字段查看历史成功访问请求跳转信息：

import requests

r = requests.get('http://github.com')

print r.url

print r.status_code

print r.history

7 超时设置

r = requests.get('http://github.com',timeout=2)

8 代理设置

import requests

proxies = {

  "http": "http://127.0.0.1:1080",

  "https": "http://127.0.0.1:1080",

}

r = requests.get("http://www.google.com", proxies=proxies)

print r.text

也可以通过环境变量HTTP_PROXY和HTTPS_PROXY来配置代理，但不常用。你的代理需要使用HTTP Basic Auth，可以使用http://user:password@host:端口

proxies = {

  "http": "http://user:password@127.0.0.1:1080",

]

人性化的Requests模块(响应与编码、header处理、cookie处理、重定向与历史记录、代理设置)的更多相关文章

03爬虫-requests模块基础(1)
requests模块基础什么是requests模块 requests模块是python中原生基于网络模拟浏览器发送请求模块.功能强大,用法简洁高效. 为什么要是用requests模块用以前的url ...
python3使用requests模块完成get/post/代理/自定义header/自定义Cookie
一.背景说明 http请求的难易对一门语言来说是很重要的而且是越来越重要,但对于python一是urllib一些写法不太符合人的思维习惯文档也相当难看,二是在python2.x和python3.x中写 ...
python基础-requests模块、异常处理、Django部署、内置函数、网络编程
网络编程 urllib的request模块可以非常方便地抓取URL内容,也就是发送一个GET请求到指定的页面,然后返回HTTP的响应. 校验返回值,进行接口测试: 编码:把一个Python对象编码转 ...
Python—requests模块详解
1.模块说明 requests是使用Apache2 licensed 许可证的HTTP库. 用python编写. 比urllib2模块更简洁. Request支持HTTP连接保持和连接池,支持使用co ...
爬虫requests模块 1
让我们从一些简单的示例开始吧. 发送请求¶ 使用 Requests 发送网络请求非常简单. 一开始要导入 Requests 模块: >>> import requests 然后,尝试 ...
Python requests模块学习笔记
目录 Requests模块说明 Requests模块安装 Requests模块简单入门 Requests示例参考文档 1.Requests模块说明 Requests 是使用 Apache2 Li ...
Python高手之路【八】python基础之requests模块
1.Requests模块说明 Requests 是使用 Apache2 Licensed 许可证的 HTTP 库.用 Python 编写,真正的为人类着想. Python 标准库中的 urllib2 ...
爬虫之requests模块
requests模块什么是requests模块 requests模块是python中原生的基于网络请求的模块,其主要作用是用来模拟浏览器发起请求.功能强大,用法简洁高效.在爬虫领域中占据着半壁江山的 ...
Python 爬虫二 requests模块
requests模块 Requests模块 get方法请求整体演示一下: import requests response = requests.get("https://www.baid ...

随机推荐

ajax 向php发送请求
<html> <head> <script src="clienthint.js"></script> </head> ...
ubuntu学习笔记-tar 解压缩命令详解(转)
tar 解压缩命令详解 -c: 建立压缩档案 -x:解压-t:查看内容-r:向压缩归档文件末尾追加文件-u:更新原压缩包中的文件这五个是独立的命令,压缩解压都要用到其中一个,可以和别的命令连用但只能 ...
SpringBoot使用JPA来做数据查询
Spring-Data-JPA在做数据存储方面真的很方便,它的目的就是写更少的代码,更多的事情,但是也有其力有未逮或者说处理起来比较闹心的地方. 1.先来感受一下使用JPA做数据查询时,代码的简化程度 ...
使用gpg来加密数据
一.数据的加密方式数据加密有三种方式: 1.对称加密(算法有:DES.AES.3DES.)加密和解密使用同一个密钥 2.非对称加密(RSA.DSA.ELGamal等等)一共四把钥匙,用公钥加密数据, ...
c++关于字符串的读入和截取
#include<iostream>#include<string>#include<vector>using namespace std;vector<st ...
【环境配置】出现：Microsoft Visual C++ 14.0 is required 的解决方案
参考blog https://download.csdn.net/download/amoscn/10399046 https://blog.csdn.net/weixin_42057852/arti ...
Codeforces 884E E. Binary Matrix
题 OvO http://codeforces.com/contest/884/problem/E 884e 解考虑并查集,每个点向上方和左方的点合并,答案即为1的总数减去需要合并的次数由于只有1 ...
SpringBoot AOP介绍
说起spring,我们知道其最核心的两个功能就是AOP(面向切面)和IOC(控制反转),这边文章来总结一下SpringBoot如何整合使用AOP. 一.示例应用场景:对所有的web请求做切面来记录日志 ...
[Linux]Ubuntu安装Java详细教程
环境:Ubuntu16.04 桌面版虚拟机 1.下载安装包:jdk-8u231-linux-x64.tar.gz 链接: https://pan.baidu.com/s/1mmtzKejL1Fd_RQ ...
hadoop安装后运行一个单实例（测试MapReduce程序）
1.安装hadoop 解压hadoop-1.2.1-bin.tar.gz包 tar -zxvf hadoop-1.2.1-bin.tar.gz /opt/modules/ 解压后在/opt/mo ...

人性化的Requests模块(响应与编码、header处理、cookie处理、重定向与历史记录、代理设置)

Requests库是第三方模块，需要额外进行安装。Requests是一个开源库

比urllib2实现方式的代码量少，下面是POST请求：

响应与编码

3 请求头header处理

响应码code和响应头header处理

cookie处理

重定向与历史记录

7 超时设置

8 代理设置

人性化的Requests模块(响应与编码、header处理、cookie处理、重定向与历史记录、代理设置)的更多相关文章

随机推荐

热门专题