python 网络爬虫requests模块

一、requests模块

requests模块是python中原生的基于网络请求的模块，其主要作用是用来模拟浏览器发起请求。功能强大，用法简洁高效。

1.1 模块介绍及请求过程

requests模块模拟浏览器发送请求

请求流程：指定url --> 发起请求 --> 获取响应对象中存储的数据 --> 持久化存储

1.2 爬取百度首页

#!/usr/bin/env python

# -*- coding:utf-8-*-

import requests

headers = {

    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'

}

url = 'https://www.baidu.com/'

response = requests.get(url=url)

response.encoding = 'utf-8'                         # 修改字符编码

page_text = response.text                           # 获取的类型为字符型<class 'str'>

with open('./baidu.html', mode='w', encoding='utf-8') as f:

    f.write(page_text)

# page_text = response.content                       # 返回二进制数据类型 <class 'bytes'>

# response.status_code                               # 获取响应状态码

# response.headers['Content-Type'] == 'text/json'    # 类型是 'text/json' 则可以使用response.json方法

# response.json                                      # 如果响应头中存储了json数据，该方法可以返回json数据

1.3 爬取百度指定词条搜索后的页面数据

#!/usr/bin/env python

# -*- coding:utf-8-*-

import requests

headers = {

    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'

}

url = 'https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&'

kw = input('请输入要搜索的内容：')

param = {'wd': kw}

response = requests.get(url=url, params=param, headers=headers)

page_text = response.content

fileName = kw+'.html'

with open(fileName, 'wb') as fp:

    fp.write(page_text)

    print(fileName+'爬取成功。')

1.4 获取百度翻译的翻译结果使用post方法

页面使用的ajax的请求方式，通过浏览器抓包得到请求的地址和提交From表单的内容。

#!/usr/bin/env python

# -*- coding:utf-8-*-

import requests

headers = {

    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'

}

url = 'https://fanyi.baidu.com/sug'

kw = input('请输入要翻译的内容：')

data = {

    'kw': kw

}

response = requests.post(url=url, data=data, headers=headers)

dic = response.json()

print(dic['data'])

-----------------------------------执行结果--------------------------------------

请输入要翻译的内容：美女

[{'k': '美女', 'v': '[měi nǚ] beauty; belle; beautiful woman; femme fat'}, {'k': '美女与野兽', 'v': '名 Beauty and the Beast;'}, {'k': '美女蛇', 'v': 'merino;'}]

--------------------------------------------------------------------------------

1.5 爬取豆瓣电影排名电影

#!/usr/bin/env python

# -*- coding:utf-8-*-

import requests

headers = {

    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'

}

url = 'https://movie.douban.com/j/chart/top_list'

param = {

    'type': '',

    'interval_id': '100:90',

    'action': '',

    'start': '',

    'limit': ''

}

json_data = requests.get(url=url, headers=headers, params=param).json()

print(json_data)

python 网络爬虫requests模块的更多相关文章

Python网络爬虫-requests模块
requests模块 requests模块是python中原生的基于网络请求的模块,其主要作用是用来模拟浏览器发起请求.功能强大,用法简洁高效.在爬虫领域中占据着半壁江山的地位. 如何使用reques ...
Python网络爬虫-requests模块(II)
有些时候,我们在使用爬虫程序去爬取一些用户相关信息的数据(爬取张三“人人网”个人主页数据)时,如果使用之前requests模块常规操作时,往往达不到我们想要的目的,例如: #!/usr/bin/env ...
python网络编程----requests模块
python访问网站可以用标准模块--urllib模块(这里省略),和requests(安装-pip install requests)模块,requests模块是在urllib的基础上进行的封装,比 ...
06 Python网络爬虫requets模块高级用法
一. 基于requests模块的cookie操作 - cookie概念: 当用户通过浏览器访问一个域名的时候,访问的web服务器会给客户端发送数据,以保持web服务器与客户端之间的状态保持,这些数据就 ...
Python网络爬虫-xpath模块
一.正解解析单字符: . : 除换行以外所有字符 [] :[aoe] [a-w] 匹配集合中任意一个字符 \d :数字 [0-9] \D : 非数字 \w :数字.字母.下划线.中文 \W : 非\ ...
Python网络爬虫：空姐网、糗百、xxx结果图与源码
如前面所述,我们上手写了空姐网爬虫,糗百爬虫,先放一下传送门: Python网络爬虫requests.bs4爬取空姐网图片Python爬虫框架Scrapy之爬取糗事百科大量段子数据Python爬虫框架 ...
【python网络爬虫】之requests相关模块
python网络爬虫的学习第一步 [python网络爬虫]之0 爬虫与反扒 [python网络爬虫]之一简单介绍 [python网络爬虫]之二 python uillib库 [python网络爬虫] ...
python 网络爬虫全流程教学，从入门到实战（requests+bs4+存储文件）
python 网络爬虫全流程教学,从入门到实战(requests+bs4+存储文件) requests是一个Python第三方库,用于向URL地址发起请求 bs4 全名 BeautifulSoup4, ...
《实战Python网络爬虫》- 感想
端午节假期过了,之前一直在做出行准备,后面旅游完又休息了一下,最近才恢复状态. 端午假期最后一天收到一个快递,回去打开,发现是微信抽奖中的一本书,黄永祥的<实战Python网络爬虫>. 去 ...

随机推荐

Todolist分别用React与Vue的实现与思考
源码查看: React 版的TodoList=> 点击跳转 Vue 版的TodoList=> 点击跳转用React实现的思路: React使用注重的思想是少用state,纯函数实现功能思 ...
Python笔记（三）：构建发布模块
(一) 准备工作 1. 新建一个模块(名称自定义),存放要发布的模块代码. 2. 新建一个setup.py的模块(存放模块的元数据,描述相关信息). 3. 新建一个文件夹(名称 ...
Error Fix – Replication subscriber does not exist on the server anymore（删除Replication时报错的解决办法）
Recently one of my client has faced weird situation related to SQL Server Replication. Their main da ...
阿里云centos7.2 lamp配置
安装apache 1.安装yum -y install httpd 2.设置apache服务开机启动systemctl enable httpd.service 3.开启apache服务systemc ...
MySQL运维之---mysqldump备份、select...into outfile、mysql -e 等工具的使用
1.mysqldump备份一个数据库 mysqldump命令备份一个数据库的基本语法: mysqldump -u user -p pwd dbname > Backup.sql 我们来讲解一下备 ...
自己模拟写C++中的String类型
下面是模拟实现字符串的相关功能,它包括一下功能: String(const char * s);//利用字符串来初始化对象 String(); //默认构造函数 String(con ...
jQuery 效果函数，jquery文档操作，jQuery属性操作方法，jQuerycss操作函数，jQuery参考手册-事件，jQuery选择器
jQuery 效果函数方法描述 animate() 对被选元素应用“自定义”的动画 clearQueue() 对被选元素移除所有排队的函数(仍未运行的) delay() 对被选元素的所有排队函数( ...
CSRF攻击详解
CSRF是什么 CSRF(Cross-site request forgery),中文名称:跨站请求伪造,也被称为:one click attack/session riding,缩写为:CSRF/X ...
python第三十课－－异常(异常处理定义格式和常见类型)
演示: 1).异常处理的定义格式: 2).常见的运行时异常类型: try: print(10/0) num=int('132a') except Exception as e: print('出错了. ...
Scala学习之路（四）Scala的数组、映射、元组、集合
一.数组 1.定长数组和变长数组 import scala.collection.mutable.ArrayBuffer object TestScala { def main(args: Array ...

python 网络爬虫requests模块

一、requests模块

python 网络爬虫requests模块的更多相关文章

随机推荐

热门专题