爬虫初探(2)之requests

关于请求网络，requests这个库是爬虫经常用到的一个第三方库。

import requests

url = 'http://www.baidu.com'

#这里用get方法用来请求网页，其他还有post等方法来请求网页

data = requests.get(url)

print(data)

#<Response [200]>

print(data.text)#这里的 .text 就等同于上一篇中的 read()

#此时同样打印出网页源码

其余方法后期学习，方法列表如下：

#HTTP请求类型

#get类型

r = requests.get('https://github.com/timeline.json')

#post类型

r = requests.post("http://m.ctrip.com/post")

#put类型

r = requests.put("http://m.ctrip.com/put")

#delete类型

r = requests.delete("http://m.ctrip.com/delete")

#head类型

r = requests.head("http://m.ctrip.com/head")

#options类型

r = requests.options("http://m.ctrip.com/get")

#获取响应内容

print r.content #以字节的方式去显示，中文显示为字符

print r.text #以文本的方式去显示

#URL传递参数

payload = {'keyword': '日本', 'salecityid': ''}

r = requests.get("http://m.ctrip.com/webapp/tourvisa/visa_list", params=payload)

print r.url #示例为http://m.ctrip.com/webapp/tourvisa/visa_list?salecityid=2&keyword=日本

#获取/修改网页编码

r = requests.get('https://github.com/timeline.json')

print r.encoding

r.encoding = 'utf-8'

#json处理

r = requests.get('https://github.com/timeline.json')

print r.json() #需要先import json    

#定制请求头

url = 'http://m.ctrip.com'

headers = {'User-Agent' : 'Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19'}

r = requests.post(url, headers=headers)

print r.request.headers

#复杂post请求

url = 'http://m.ctrip.com'

payload = {'some': 'data'}

r = requests.post(url, data=json.dumps(payload)) #如果传递的payload是string而不是dict，需要先调用dumps方法格式化一下

#post多部分编码文件

url = 'http://m.ctrip.com'

files = {'file': open('report.xls', 'rb')}

r = requests.post(url, files=files)

#响应状态码

r = requests.get('http://m.ctrip.com')

print r.status_code

#响应头

r = requests.get('http://m.ctrip.com')

print r.headers

print r.headers['Content-Type']

print r.headers.get('content-type') #访问响应头部分内容的两种方式

#Cookies

url = 'http://example.com/some/cookie/setting/url'

r = requests.get(url)

r.cookies['example_cookie_name']    #读取cookies

url = 'http://m.ctrip.com/cookies'

cookies = dict(cookies_are='working')

r = requests.get(url, cookies=cookies) #发送cookies

#设置超时时间

r = requests.get('http://m.ctrip.com', timeout=0.001)

#设置访问代理

proxies = {

           "http": "http://10.10.10.10:8888",

           "https": "http://10.10.10.100:4444",

          }

r = requests.get('http://m.ctrip.com', proxies=proxies)

爬虫初探(2)之requests的更多相关文章

python爬虫学习(6) —— 神器 Requests
Requests 是使用 Apache2 Licensed 许可证的 HTTP 库.用 Python 编写,真正的为人类着想. Python 标准库中的 urllib2 模块提供了你所需要的大多数 H ...
Node.js 爬虫初探
前言在学习慕课网视频和Cnode新手入门接触到爬虫,说是爬虫初探,其实并没有用到爬虫相关第三方类库,主要用了node.js基础模块http.网页分析工具cherrio. 使用http直接获取url路 ...
(转)Python爬虫利器一之Requests库的用法
官方文档以下内容大多来自于官方文档,本文进行了一些修改和总结.要了解更多可以参考官方文档安装利用 pip 安装 $ pip install requests 或者利用 easy_install ...
[python爬虫]Requests-BeautifulSoup-Re库方案--Requests库介绍
[根据北京理工大学嵩天老师“Python网络爬虫与信息提取”慕课课程编写文章中部分图片来自老师PPT 慕课链接:https://www.icourse163.org/learn/BIT-10018 ...
爬虫系列4：Requests+Xpath 爬取动态数据
爬虫系列4:Requests+Xpath 爬取动态数据 [抓取]:参考前文爬虫系列1:https://www.cnblogs.com/yizhiamumu/p/9451093.html [分页]:参 ...
Python爬虫利器一之Requests库的用法
前言之前我们用了 urllib 库,这个作为入门的工具还是不错的,对了解一些爬虫的基本理念,掌握爬虫爬取的流程有所帮助.入门之后,我们就需要学习一些更加高级的内容和工具来方便我们的爬取.那么这一节来 ...
网络爬虫必备知识之requests库
就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结对requests库的使用方法进行总结 1. ...
爬虫系列(八) 用requests实现天气查询
这篇文章我们将使用 requests 调用天气查询接口,实现一个天气查询的小模块,下面先贴上最终的效果图 1.接口分析虽然现在网络上有很多免费的天气查询接口,但是有很多网站都是需要注册登陆的,过程比 ...
爬虫系列(十) 用requests和xpath爬取豆瓣电影
这篇文章我们将使用 requests 和 xpath 爬取豆瓣电影 Top250,下面先贴上最终的效果图: 1.网页分析 (1)分析 URL 规律我们首先使用 Chrome 浏览器打开豆瓣电影 T ...

随机推荐

MySQL延迟复制--percona-toolkit和MASTER TO MASTER_DELAY
为了数据的安全,有的时候数据库需要延迟备份,这里说下两种延迟备份的方法. 一.借助工具. 实现环境: 192.168.189.143 (mysql主库) 192.168.189.144 (mysql备 ...
A Game（洛谷 2734）
题目背景有如下一个双人游戏:N(2 <= N <= 100)个正整数的序列放在一个游戏平台上,游戏由玩家1开始,两人轮流从序列的任意一端取一个数,取数后该数字被去掉并累加到本玩家的得分中 ...
postgres索引创建、存储过程的创建以及在c#中的调用
postgres创建索引参考 http://www.cnblogs.com/stephen-liu74/archive/2012/05/09/2298182.html CREATE TABLE tes ...
javase--反射
//书写规则 package cn.reflex; public interface PCI { public void open(); public void close(); } //调用方法 p ...
taskkill批量删除进程命令
本人自用: TASKKILL /F /IM notepad --强制删除进程名中带notepad的所有进程 TASKKILL [/S system [/U username [/P [password ...
Java面试常见知识点总结(三)
21.volatile关键字: 一旦一个共享变量(类的成员变量.类的静态成员变量)被volatile修饰之后,那么就具备了两层语义: (1) 保证了不同线程对这个变量进行操作时的可见性,即一个线程 ...
ghoest32 不重启电脑手动备份系统为.gho
备份系统我们一般使用DOS之家的ghoest备份工具,但备份必须是重启电脑在DOS命令行下,其实,可以不重启电脑备份系统,也就是手动备份系统.DOS之家用的ghoest本质也是赛门铁克公司出的ghoe ...
【Java EE 学习 80 下】【调用WebService服务的四种方式】【WebService中的注解】
不考虑第三方框架,如果只使用JDK提供的API,那么可以使用三种方式调用WebService服务:另外还可以使用Ajax调用WebService服务. 预备工作:开启WebService服务,使用jd ...
SQL Server客户端请求
SQL Server是客户端 - 服务器平台.通过发送包含对数据库请求的命令是与后端数据库进行交互的唯一方法.你的应用程序和数据库之间通信的协议被称为TDS(表格数据流协议). 应用程序可以使用该协议 ...
[leetcode] 390 Elimination Game
很开心,自己想出来的一道题 There is a list of sorted integers from 1 to n. Starting from left to right, remove th ...

爬虫初探(2)之requests

爬虫初探(2)之requests的更多相关文章

随机推荐

热门专题