requests

什么是requests模块

python中封装好的一个基于网络请求的模块

作用

用来模拟浏览器发送请求

环境安装

pip install requests

编码流程

指定 url
发起请求
获取响应数据
持久化存储

爬取搜狗首页的页面源码数据

#爬取搜狗首页的页面源码数据

import requests

#1.指定url

url = 'https://www.sogou.com/'

#2.请求发送get:get返回值是一个响应对象

response = requests.get(url=url)

#3.获取响应数据

page_text = response.text #返回的是字符串形式的响应数据

#4.持久化存储

with open('sogou.html','w',encoding='utf-8') as fp:

    fp.write(page_text)

#实现一个简易的网页采集器

#需要让url携带的参数动态化

url = 'https://www.sogou.com/web'

#实现参数动态化

wd = input('enter a key:')

params = {

    'query':wd

}

#在请求中需要将请求参数对应的字典作用到params这个get方法的参数中

response = requests.get(url=url,params=params)

page_text = response.text

fileName = wd+'.html'

with open(fileName,'w',encoding='utf-8') as fp:

    fp.write(page_text)

上述代码执行后发现：
- 1.出现了乱码
- 2.数据量级不对

#解决乱码

url = 'https://www.sogou.com/web'

#实现参数动态化

wd = input('enter a key:')

params = {

    'query':wd

}

#在请求中需要将请求参数对应的字典作用到params这个get方法的参数中

response = requests.get(url=url,params=params)

response.encoding = 'utf-8' #修改响应数据的编码格式

page_text = response.text

fileName = wd+'.html'

with open(fileName,'w',encoding='utf-8') as fp:

    fp.write(page_text)

UA检测：门户网站通过检测请求载体的身份标识判定改请求是否为爬虫发起的请求
UA伪装：Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36

#解决UA检测

url = 'https://www.sogou.com/web'

#实现参数动态化

wd = input('enter a key:')

params = {

    'query':wd

}

headers = {

    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'

}

#在请求中需要将请求参数对应的字典作用到params这个get方法的参数中

response = requests.get(url=url,params=params,headers=headers)

response.encoding = 'utf-8' #修改响应数据的编码格式

page_text = response.text

fileName = wd+'.html'

with open(fileName,'w',encoding='utf-8') as fp:

    fp.write(page_text)

爬取豆瓣电影中电影的详情数据

https://movie.douban.com/typerank?type_name=爱情&type=13&interval_id=100:90&action=
分析: 当滚动条被滑动到页面底部的时候，当前页面发生了局部刷新（ajax的请求）

url = 'https://movie.douban.com/j/chart/top_list'

start = input('您想从第几部电影开始获取:')

limit = input('您想获取多少电影数据:')

dic = {

    'type': '13',

    'interval_id': '100:90',

    'action': '',

    'start': start,

    'limit': limit,

}

response = requests.get(url=url,params=dic,headers=headers)

page_text = response.json() #json()返回的是序列化好的实例对象

for dic in page_text:

    print(dic['title']+':'+dic['score'])

肯德基餐厅查询

#肯德基餐厅查询http://www.kfc.com.cn/kfccda/storelist/index.aspx

url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'

for page in range(1,5):

    data = {

        'cname': '',

        'pid': '',

        'keyword': '西安',

        'pageIndex': str(page),

        'pageSize': '10',

    }

    response = requests.post(url=url,headers=headers,data=data)

    print(response.json())

requests模块的基本用法的更多相关文章

爬虫 requests模块的其他用法抽屉网线程池回调爬取+保存实例,gihub登陆实例
requests模块的其他用法 #通常我们在发送请求时都需要带上请求头,请求头是将自身伪装成浏览器的关键,常见的有用的请求头如下 Host Referer #大型网站通常都会根据该参数判断请求的来源 ...
requests模块的高级用法
SSL Cert Verification #证书验证(大部分网站都是https) import requests respone=requests.get('https://www.12306.cn ...
爬虫（1）：requests模块
requests介绍: reqeusts模块:python原生一个基于网络请求的模块,模拟浏览器发起请求. requests模块的优点: - 1.自动处理url编码 - 2.自动处理post请求的参数 ...
全程干货，requests模块与selenium框架详解
requests模块前言: 通常我们利用Python写一些WEB程序.webAPI部署在服务端,让客户端request,我们作为服务器端response数据: 但也可以反主为客利用Python的re ...
爬虫 requests模块高级用法
一介绍 #介绍:使用requests可以模拟浏览器的请求,比起之前用到的urllib,requests模块的api更加便捷(本质就是封装了urllib3) #注意:requests库发送请求将网页内 ...
爬虫requests模块 1
让我们从一些简单的示例开始吧. 发送请求¶ 使用 Requests 发送网络请求非常简单. 一开始要导入 Requests 模块: >>> import requests 然后,尝试 ...
Python requests模块
import requests 下面就可以使用神奇的requests模块了! 1.向网页发送数据 >>> payload = {'key1': 'value1', 'key2': [ ...
python爬虫之requests模块介绍
介绍 #介绍:使用requests可以模拟浏览器的请求,比起之前用到的urllib,requests模块的api更加便捷(本质就是封装了urllib3) #注意:requests库发送请求将网页内容下 ...
爬虫之requests模块
requests模块什么是requests模块 requests模块是python中原生的基于网络请求的模块,其主要作用是用来模拟浏览器发起请求.功能强大,用法简洁高效.在爬虫领域中占据着半壁江山的 ...

随机推荐

Tensorflow 循环神经网络基本 RNN 和 LSTM 网络拟合、预测sin曲线
时序预测一直是比较重要的研究问题,在统计学中我们有各种的模型来解决时间序列问题,但是最近几年比较火的深度学习中也有能解决时序预测问题的方法,另外在深度学习领域中时序预测算法可以解决自然语言问题等. 在 ...
osg编译日志
1>------ 已启动全部重新生成: 项目: ZERO_CHECK, 配置: Debug x64 ------1> Checking Build System1> CMake do ...
Flutter实战（四）---LoadingDialog
原文链接:https://blog.csdn.net/johnWcheung/article/details/89634582
Wpf 关闭当前窗体打开新窗体
MainWindow mainWindow = new MainWindow("/pages/ProductionInfo/ProductionFacts.xaml"); Wind ...
为何windows自带的文件搜索这么慢，而Everything的这么快
为何windows自带的文件搜索这么慢,而Everything的这么快摘自:http://blog.sina.com.cn/s/blog_9f0cf4ed0102wvkq.html (2016-07 ...
NativeExcel3使用示例
除了XLSReadWriteII5,还有个NativeExcel也是比较好的操作excel的组件,现将NativeExcel3的使用示例写一下,以下是代码和生成的excel表格的效果: procedu ...
tensor&ndarray&int、float
(1)如果tensor只有一个元素,然后转换成int或者float类型的时候直接用int()或者float()就可以了: (2)如果tensor含有多个元素,转换成ndarray时就要用x.detac ...
利用SynchronizationContext.Current在线程间同步上下文（转）
https://blog.csdn.net/iloli/article/details/16859605 简而言之就是允许一个线程和另外一个线程进行通讯,SynchronizationContext在 ...
window.open post传参
目录前言获取当前用户信息使用window.open的两种方式 Get方式 Post方式前言我使用的场景是,点击弹窗,然后把我当前用户的消息传过去获取当前用户信息打开Chrome浏览器,在 ...
【Leetcode_easy】1047. Remove All Adjacent Duplicates In String
problem 1047. Remove All Adjacent Duplicates In String 参考 1. Leetcode_easy_1047. Remove All Adjacent ...

requests模块的基本用法

requests

什么是requests模块

作用

环境安装

编码流程

爬取搜狗首页的页面源码数据

爬取豆瓣电影中电影的详情数据

肯德基餐厅查询

requests模块的基本用法的更多相关文章

随机推荐

热门专题