爬虫——request

命名规范

module_name，模块
package_name，包
ClassName，类
method_name，方法
ExceptionName，异常
function_name，函数
GLOBAL_VAR_NAME，全局变量
instance_var_name，实例
function_parameter_name，参数
local_var_name，本变量

爬取图片

直接用get请求图片网址即可

 # photo_url = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-685513.jpg'

 # response_get = requests.get(gif_uri)

 # with open('panda.gif','wb') as f:

 #     f.write(response_get.content)

百度翻译

百度固定格式kw，用post请求发送请求头和kw单词给百度翻译接口，编码格式utf-8

 # headers = {

 #     'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'

 # }

 #

 # kw = {

 #     'kw':'wolf'

 # }

 #

 # response_post = requests.post('http://fanyi.baidu.com/sug',headers=headers,data=kw)

 # response_post.encoding = 'utf-8'

 # # print(response_post.text)

 # import json

 # data = response_post.text

 # info = json.loads(data)

 # print(info)

 # # print(info['data'][0]['v'])

 # for i in info['data'][0]['v'].split('; '):

 #     print(i)

登录爬取

爬取登录后的页面，将登陆后的set_cookie或Cookie写到请求头里，可能遇到网站限速

 # headers = {

 #     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',

 #     # 'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36',

 #     'Cookie':'session_id_places=True; session_data_places=""'

 # }

 #

 # r = requests.get('http://example.webscraping.com',headers=headers)

8 # print(r.text)

代理服务

利用代理服务器爬取百度页面(要指定http协议和端口号)，用get请求发送代理和请求头给百度

 proxies = {'http':'ip'}

 headers = {

     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',

 }

 r = requests.get('http_ljb://www.baidu.com',proxies=proxies,headers=headers)

 # print(r.status_code)          #状态码

 # print(r.text)                 #爬取的内容

 # print(r.content)              #爬取的内容，text可能有字符格式问题

 # print(r.headers)              #请求头

 # print(r.url)                  #请求的地址

 # print(r.cookies)              #cookie信息

爬虫——request的更多相关文章

爬虫---request+++urllib
网络爬虫(又被称为网页蜘蛛,网络机器人,在FOAF社区中间,更经常的称为网页追逐者),是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本.另外一些不常使用的名字还有蚂蚁.自动索引.模拟程序或者蠕 ...
爬虫request库规则与实例
Request库的7个主要方法: requests.request(method,url,**kwargs) method:请求方式,对应get/put/post等7种: r = reques ...
爬虫-request以及beautisoup模块笔记
requests模块 pip3 install requests res = requests.get('') res.text res.cookies.get_dict() res.content ...
Python爬虫——request实例：爬取网易云音乐华语男歌手top10歌曲
requests是python的一个HTTP客户端库,跟urllib,urllib2类似,但比那两个要简洁的多,至于request库的用法, 推荐一篇不错的博文:https://cuiqingcai. ...
爬虫-request和BeautifulSoup模块
requests简介 Python标准库中提供了:urllib.urllib2.httplib等模块以供Http请求,但是,它的 API 太渣了.它是为另一个时代.另一个互联网所创建的.它需要巨量的工 ...
Python爬虫——Request模块
# 使用 Requests 发送网络请求# 1.导入 Requests 模块import requests# 2.尝试获取某个网页 # HTTP 请求类型r = requests.get('https ...
Python爬虫-request的用法
import requests if __name__ == '__main__': #基本用法 #response = requests.get("http://httpbin.org/g ...
爬虫 request payloa
小知识点: https://blog.csdn.net/zwq912318834/article/details/79930423
Scrapy爬虫入门Request和Response（请求和响应）
开发环境:Python 3.6.0 版本 (当前最新)Scrapy 1.3.2 版本 (当前最新) 请求和响应 Scrapy的Request和Response对象用于爬网网站. 通常,Request对 ...

随机推荐

Django-model基础
Django-model基础在Django-ORM中表和类存在映射关系表名<------------>类名字段<------------>属性表记录<------ ...
webpack学习笔记--配置devServer
devServer 1-6 使用DevServer 介绍过用来提高开发效率的 DevServer ,它提供了一些配置项可以改变 DevServer 的默认行为. 要配置 DevServer ,除了在配 ...
grails2.3.3发布了-【grails】
grails2.3.3发布了,在grails2.3.2中默认方式fork模式下无法运行的BUG也解决了. 需要做的相关修改为修改BuildConfig.groovy: build ':tomcat:7 ...
sed命令实现文件内容替换总结案例
sed -i "s@AAAAA@BBBBB@g" /home/local/payment-biz-service/env/test.txt sed -i "s#htxk. ...
idea创建项目报错（Maven execution terminated abnormally (exit code 1) ）解决方案
版本: idea14.0.2 java1.8 maven3.5 -------------------------------------------------------------------- ...
http--tomcat--memcached配置
两个tomcat节点:172.16.100.7(tomcatA.magedu.com),172.16.100.8(tomcatB.magedu.com) 两个memcached节点:172.16.10 ...
UOJ#23. 【UR #1】跳蚤国王下江南仙人掌 Tarjan 点双圆方树点分治多项式 FFT
原文链接https://www.cnblogs.com/zhouzhendong/p/UOJ23.html 题目传送门 - UOJ#23 题意给定一个有 n 个节点的仙人掌(可能有重边). 对于所有 ...
51Nod1336 RMQ逆问题其他
原文链接https://www.cnblogs.com/zhouzhendong/p/51Nod1336.html 题目传送门 - 51Nod1336 题意题解我们将输入的一个区间的答案称为 V ...
《Thinking In Java》阅读笔记
<Thinking In Java>阅读笔记前四章:对象导论. 一切都是对象. 操作符. 控制执行流程 public在一个文件中只能有一个,可以是一个类class或者一个接口interf ...
Vagrant 管理部署 VirtualBox (推荐使用）
学习一段时间的大数据和容器技术,使用虚拟机搭建实验环境还是挺耗时耗力的. 一旦虚拟机坏掉了,还要重新开始. 最近发现了Vagrant, 简直好用上天,方便快捷,易用. 下面介绍如何在Windows中安 ...

爬虫——request

命名规范

爬取图片

百度翻译

登录爬取

代理服务

爬虫——request的更多相关文章

随机推荐

热门专题