python Requests库在处理response时的一些陷阱
python的Requests(http://docs.python-requests.org/en/latest/)库在处理http/https请求时还是比较方便的,应用也比较广泛。
但其在处理response时有一些地方需要特别注意,简单来说就是Response对象的content方法和text方法的区别,具体代码如下:
@property
def content(self):
"""Content of the response, in bytes.""" if self._content is False:
# Read the contents.
try:
if self._content_consumed:
raise RuntimeError(
'The content for this response was already consumed') if self.status_code == 0:
self._content = None
else:
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes() except AttributeError:
self._content = None self._content_consumed = True
# don't need to release the connection; that's been handled by urllib3
# since we exhausted the data.
return self._content @property
def text(self):
"""Content of the response, in unicode. if Response.encoding is None and chardet module is available, encoding
will be guessed.
""" # Try charset from content-type
content = None
encoding = self.encoding if not self.content:
return str('') # Fallback to auto-detected encoding.
if self.encoding is None:
encoding = self.apparent_encoding # Decode unicode from given encoding.
try:
content = str(self.content, encoding, errors='replace')
except (LookupError, TypeError):
# A LookupError is raised if the encoding was not found which could
# indicate a misspelling or similar mistake.
#
# A TypeError can be raised if encoding is None
#
# So we try blindly encoding.
content = str(self.content, errors='replace') return content
@property
def apparent_encoding(self):
"""The apparent encoding, provided by the lovely Charade library
(Thanks, Ian!)."""
return chardet.detect(self.content)['encoding']
可以看出text方法中对原始数据做了编码操作
其中response的encoding属性是在adapters.py中的HTTPAdapter中的build_response中进行赋值,具体代码如下:
def build_response(self, req, resp):
"""Builds a :class:`Response <requests.Response>` object from a urllib3
response. This should not be called from user code, and is only exposed
for use when subclassing the
:class:`HTTPAdapter <requests.adapters.HTTPAdapter>` :param req: The :class:`PreparedRequest <PreparedRequest>` used to generate the response.
:param resp: The urllib3 response object.
"""
response = Response() # Fallback to None if there's no status_code, for whatever reason.
response.status_code = getattr(resp, 'status', None) # Make headers case-insensitive.
response.headers = CaseInsensitiveDict(getattr(resp, 'headers', {})) # Set encoding.
response.encoding = get_encoding_from_headers(response.headers)
response.raw = resp
response.reason = response.raw.reason if isinstance(req.url, bytes):
response.url = req.url.decode('utf-8')
else:
response.url = req.url # Add new cookies from the server.
extract_cookies_to_jar(response.cookies, req, resp) # Give the Response some context.
response.request = req
response.connection = self return response
从上述代码(response.encoding = get_encoding_from_headers(response.headers))中可以看出,具体的encoding是通过解析headers得到的,
def get_encoding_from_headers(headers):
"""Returns encodings from given HTTP Header Dict. :param headers: dictionary to extract encoding from.
""" content_type = headers.get('content-type') if not content_type:
return None content_type, params = cgi.parse_header(content_type) if 'charset' in params:
return params['charset'].strip("'\"") if 'text' in content_type:
return 'ISO-8859-1'
为避免Requests采用chardet去猜测response的编码,请慎用text属性,直接使用content属性即可,再根据实际需要进行编码。
对于服务端没有显式指明charset的response来说,采用text和content的差别如下所示:
代码:
print time.time()
print 'begin request'
r = requests.get(r'http://www.sina.com.cn')
# erase response encoding
r.encoding = None
r.text
#r.content
print 'request end'
print time.time()
采用text时的耗时:
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAKMAAABICAIAAAAVjdKWAAADXElEQVR4nO2bW3LkIAxF2W7vqrcx8+UtzCwjq5iPVHUxIF2uMDEP6VQ+aMCS4NhVbpKk33+/Pj+//ny93+/X65WC8wjTXgjTXgjTXsCmr+sq5l8ZYmc+qvXXQ2KKjrx4Poiv0SySjDMfYFpbHm7nH639YlI+LxPQ5IbJu41s0TRzw44ynSeymhs137rGc0x/D/GmyaHaaHZHXcyFoM3UecdKHQfXvBwdpvN1aqO4P9+mpIgvgohDYL5YTHO+Bq6HjzOTsc90x44n6Vm3xheHcA8vCcTZSfZA06TmVG3QT5huXk4aGhVnPkPevbUeciZpmq/H2s57tNuFj7MozXdvcfHi8njTRRwmPk4Kgljrvx9nUeKMzAth2gth2gth2gth2gth2gth2gv4+/SoLDt97zwV7ZkO0002W9Ezps9jv/3BpvlTwLwzH20GYfZLC26tpy6MjwPqZ5YwH+aZbu4U2EGxszm/pqihr55Rbb7shegwnVNMSD9suhBsrQe36zigPLLsheh+pnMeM810gvhMm8zuyLQo70OR5QHTT7aTsidLA75Pp9avbMWA+A6od4fZKZBUG6qTFv1iGTiOlhoXvwpDzsis5p5ktXqmMeo0FNz7E1mzqjnEubcXwrQXwrQXwrQXwrQXwnTOyS/qG5l+RkCYnsxjAnyZ1o4ME3FqmJQzYWscbTJpgomjzXdkOm3+++CBec+h+TcnSbJe3Pt97TpO0je6w/T9Oo/C9HdkjIYhu1lP6DB9v86j6Db9ZDv/yJgYmPccmv8/XczXhq7/qfuTtKE4jpaaWRcopm6DOs9h+LesM7fpAMaaBk9kMJldTk6Cu4RpL4RpL4RpL4RpL2DT5PfpSwL010Niio68eD7o17DGXxdgWtsm3M4/WvvFpHxe03xGElP/NrKtZ2QfRpnOE90xZ62z79r6kr1Nfw/xpsmh2mhxSy1ruqhTi780Haa1NTNXFW0gvggiDoH5Wp34Eq3mS7oRySCrMPaZZjQn/VGuLZLx+dTNOPU0sTxrqPkMNM3vdfE0rG/6fqj5DHn31nrImaRpvp47dYqFmepclOa7t7hacXm86SIOEx8nBfPJONpQxz4sSpyReSFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMO+Ef673JmtCMTGwAAAAASUVORK5CYII=" alt="" />
采用content时的耗时:
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAK8AAABDCAIAAABldsLbAAADVElEQVR4nO2dUXLkIAxEue7caq6x+fIVdo+RU+RvxABqGoYllt2v8oFBkYTUdpWZqSR9/ft+/fz5+/18Ph+PRxL3BKvhOI7C/shoTuar3ny91AzRjevZe0HBvjxG84wNUIO3fzzOL0fnm0HJWJ59Dd/C0TzD01QDuKterFJDHsiL6MVi7Ov5OTUweYZn6NmQA1a7TSoG29TArIKEpYaj/hV8ew11FIijcNJc8uyBH5AhyJnPMzZrnw2MFJJ/q9WdZvwMhe7uq7acyDMqC9Uw2o+6oKdVw2ieUVnyTuHNkJZklfl8PsnTUxufZ2C67xTNijRLwKuh8MP4x0GBveecDDFRh8DoLFIYUoMwpAZhSA3CkBqEITUIQ2oQBj5vWBXlau/lV8V7NkgNXa63o01quB7XrA9WA38im0/mq10nTE0956P51InxfkD+zBZiwDwbutUEVW5Odu1rihzm8lk15tMOxoQacgqD9J/VUIhgNB88rv2A9Mi0gzH9bMjZpgZmEvhnxmR0qQFV8P3WOtI7G9Swc5ycmoQHnDck/wsgoARYJXUFmWqCoN5SHbSYb6aB/XihcfKRWHIWOdrdnZwtn1Oz6mQa3EO/yDmzOi/6nEIYUoMwpAZhSA3CkBqEITXk3P0FJJAa9jRJagighm1NkhrQN+G8k13vBDc5Z/ijfjxjsluMH89eanA/tWq2efO4vsQsjHsvut99Si1lFPfQ3Lj2k/xmTKjh8zxvx9D3IplWLal4bTChhs/zvB3Tatg5zi+Zbi2Mey+6f7+hsPeWjnfq+dQqOvbjhWb2BZKpxyDPe7H8DfO+pbwAa9UA7mwRgCinT2IHUoMwpAZhSA3CkBqEgdVAnjccLcB8vdQM0Y3r2XtBsR+P0XwCA9TglQCP88vR+WZQMpZnT/rxGM0nNqNnkS9WqSEPNNpFxp7MubvHT2zCMPRsyAGr3WYUg21q+C2bMEyoIe+ft4rn8/YnRxyFk+aSZw/84CS9nTbtcR1CsvbZQFa/KHE9IP137ZvzfAvn/AdmoRpG+1EXeoMahponNVz5/1Mw/hm1MSoJSfedolmgZgl4NRR+GP84KLD3nJMhhvyHR2eRwpAahCE1CENqEIbUIAypQRhSgzCkBmFIDeLFD3SGYZeFVbbYAAAAAElFTkSuQmCC" alt="" />
python Requests库在处理response时的一些陷阱的更多相关文章
- python requests库学习笔记(上)
尊重博客园原创精神,请勿转载! requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.pytho ...
- Python——Requests库的开发者接口
本文介绍 Python Requests 库的开发者接口,主要内容包括: 目录 一.主要接口 1. requests.request() 2. requests.head().get().post() ...
- Python:requests库、BeautifulSoup4库的基本使用(实现简单的网络爬虫)
Python:requests库.BeautifulSoup4库的基本使用(实现简单的网络爬虫) 一.requests库的基本使用 requests是python语言编写的简单易用的HTTP库,使用起 ...
- Python requests库的使用(一)
requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.python-requests.org/z ...
- 大概看了一天python request源码。写下python requests库发送 get,post请求大概过程。
python requests库发送请求时,比如get请求,大概过程. 一.发起get请求过程:调用requests.get(url,**kwargs)-->request('get', url ...
- 使用python requests库写接口自动化测试--记录学习过程中遇到的坑(1)
一直听说python requests库对于接口自动化测试特别合适,但由于自身代码基础薄弱,一直没有实践: 这次赶上公司项目需要,同事小伙伴们一起学习写接口自动化脚本,听起来特别给力,赶紧实践一把: ...
- python利用requests库模拟post请求时json的使用
我们都见识过requests库在静态网页的爬取上展现的威力,我们日常见得最多的为get和post请求,他们最大的区别在于安全性上: 1.GET是通过URL方式请求,可以直接看到,明文传输. 2.POS ...
- Python Requests库:HTTP for Humans
Python标准库中用来处理HTTP的模块是urllib2,不过其中的API太零碎了,requests是更简单更人性化的第三方库. 用pip下载: pip install requests 或者git ...
- python requests库学习笔记(下)
1.请求异常处理 请求异常类型: 请求超时处理(timeout): 实现代码: import requestsfrom requests import exceptions #引入exc ...
随机推荐
- dedecms最新版本修改任意管理员漏洞+getshell+exp
此漏洞无视gpc转义,过80sec注入防御. 补充下,不用担心后台找不到.这只是一个demo,都能修改任意数据库了,还怕拿不到SHELL? 起因是全局变量$GLOBALS可以被任意修改,随便看了下,漏 ...
- maven入门探讨
java项目最恶心的一点莫过于需要使用大量的jar.每次引用jar的时候都要自己手动去各地寻找,然后导入到项目的指定文件夹当中最后还要添加Path.这无疑是一项工作量巨大的工作,同时如果控制不当就会提 ...
- javascript new
1. 仅function可以使用new 2. function使用new时,会拷贝function中this的内容给新对象,并将function的prototype指向新对象(如果该function没 ...
- SSM框架学习之高并发秒杀业务--笔记4-- web层
在前面几节中已经完成了service层和dao层,到目前为止只是后端的设计与编写,这节就要设计到前端的设计了.下面开始总结下这个秒杀业务前端有哪些要点: 1. 前端页面的流程 首先是列表页,点某个商品 ...
- LaTeX自学ing
恩看标题嘛...Xs要自学LaTeX的说! 话说cnblogs不支持插入LaTeX格式的代码的说?..唔~ 嘛...首先是些简单的东西(像是文件的格式啦,作者啦之类的): (注意:\documentc ...
- NOIP2014感想
NOIP2014转眼就结束了,让人不由感慨时间之快,仿佛几天前还是暑假,几天后就已经坐在考场里了. 从暑假8月开始写博客,发了一些解题报告什么的,但这篇文章不再会是“题目大意 & 解题过程 & ...
- 计算第k个质因数只能为3,5,7的数
英文描述:Design an algorithm to find the kth number such that the only prime factors are 3, 5, and 7 思路: ...
- oracle 解锁表
//查询锁表id select session_id from v$locked_object; //查询该ID的serial# SELECT sid, serial#, username, osus ...
- HOW TO BE SINGLE 最后那段的摘录
我一直在思考我们不得不单身的时间这个时间我们需要擅长一个人独处但是有多少独处的状态是我们想要拥有的呢难道不是件很危险的事情吗当你适应状态并且如鱼得水的时候所以当你安定下来 你就会与某人擦肩而过吗 有些 ...
- HDU 4352 XHXJ's LIS
奇妙的题. 你先得会另外一个nlogn的LIS算法.(我一直只会BIT.....) 然后维护下每个数码作为结尾出现过没有就完了. #include<iostream> #include&l ...