optim.py cs231n
n如果有错误,欢迎指出,不胜感激
import numpy as np """
This file implements various first-order update rules that are commonly used for
training neural networks. Each update rule accepts current weights and the
gradient of the loss with respect to those weights and produces the next set of
weights. Each update rule has the same interface: def update(w, dw, config=None): Inputs:
- w: A numpy array giving the current weights.
- dw: A numpy array of the same shape as w giving the gradient of the
loss with respect to w.
- config: A dictionary containing hyperparameter values such as learning rate,
momentum, etc. If the update rule requires caching values over many
iterations, then config will also hold these cached values. Returns:
- next_w: The next point after the update.
- config: The config dictionary to be passed to the next iteration of the
update rule. NOTE: For most update rules, the default learning rate will probably not perform
well; however the default values of the other hyperparameters should work well
for a variety of different problems. For efficiency, update rules may perform in-place updates, mutating w and
setting next_w equal to w.
""" def sgd(w, dw, config=None):
"""
Performs vanilla stochastic gradient descent. config format:
- learning_rate: Scalar learning rate.
"""
if config is None: config = {}
config.setdefault('learning_rate', 1e-2)
w -= config['learning_rate'] * dw
return w, config def sgd_momentum(w, dw, config=None):
"""
Performs stochastic gradient descent with momentum. config format:
- learning_rate: Scalar learning rate.
- momentum: Scalar between 0 and 1 giving the momentum value.
Setting momentum = 0 reduces to sgd.
- velocity: A numpy array of the same shape as w and dw used to store a moving
average of the gradients.
"""
if config is None: config = {}
config.setdefault('learning_rate', 1e-2)
config.setdefault('momentum', 0.9)
v = config.get('velocity', np.zeros_like(w)) next_w = None
v=v*config['momentum']-config['learning_rate']*dw
next_w=w+v
config['velocity'] = v return next_w, config def rmsprop(x, dx, config=None):
"""
Uses the RMSProp update rule, which uses a moving average of squared gradient
values to set adaptive per-parameter learning rates. config format:
- learning_rate: Scalar learning rate.
- decay_rate: Scalar between 0 and 1 giving the decay rate for the squared
gradient cache.
- epsilon: Small scalar used for smoothing to avoid dividing by zero.
- cache: Moving average of second moments of gradients.
"""
if config is None: config = {}
config.setdefault('learning_rate', 1e-2)
config.setdefault('decay_rate', 0.99)
config.setdefault('epsilon', 1e-8)
config.setdefault('cache', np.zeros_like(x)) next_x = None cache=config['cache']*config['decay_rate']+(1-config['decay_rate'])*dx**2
next_x=x-config['learning_rate']*dx/np.sqrt(cache+config['epsilon'])
config['cache']=cache return next_x, config def adam(x, dx, config=None):
"""
Uses the Adam update rule, which incorporates moving averages of both the
gradient and its square and a bias correction term. config format:
- learning_rate: Scalar learning rate.
- beta1: Decay rate for moving average of first moment of gradient.
- beta2: Decay rate for moving average of second moment of gradient.
- epsilon: Small scalar used for smoothing to avoid dividing by zero.
- m: Moving average of gradient.
- v: Moving average of squared gradient.
- t: Iteration number.
"""
if config is None: config = {}
config.setdefault('learning_rate', 1e-3)
config.setdefault('beta1', 0.9)
config.setdefault('beta2', 0.999)
config.setdefault('epsilon', 1e-8)
config.setdefault('m', np.zeros_like(x))
config.setdefault('v', np.zeros_like(x))
config.setdefault('t', 0)
config['t']+=1
这个方法比较综合,各种方法的好处吧
m=config['beta1']*config['m']+(1-config['beta1'])*dx # now to change by acc
v=config['beta2']*config['v']+(1-config['beta2'])*dx**2
config['m']=m
config['v']=v
m=m/(1-config['beta1']**config['t'])
v=v/(1-config['beta2']**config['t']) next_x=x-config['learning_rate']*m/np.sqrt(v+config['epsilon']) return next_x, config
n
optim.py cs231n的更多相关文章
- cnn.py cs231n
n import numpy as np from cs231n.layers import * from cs231n.fast_layers import * from cs231n.layer_ ...
- fc_net.py cs231n
n如果有错误,欢迎指出,不胜感激 import numpy as np from cs231n.layers import * from cs231n.layer_utils import * cla ...
- layers.py cs231n
如果有错误,欢迎指出,不胜感激. import numpy as np def affine_forward(x, w, b): 第一个最简单的 affine_forward简单的前向传递,返回 ou ...
- 笔记:CS231n+assignment2(作业二)(一)
第二个作业难度很高,但做(抄)完之后收获还是很大的.... 一.Fully-Connected Neural Nets 首先是对之前的神经网络的程序进行重构,目的是可以构建任意大小的全连接的neura ...
- 深度学习原理与框架-神经网络-cifar10分类(代码) 1.np.concatenate(进行数据串接) 2.np.hstack(将数据横着排列) 3.hasattr(判断.py文件的函数是否存在) 4.reshape(维度重构) 5.tanspose(维度位置变化) 6.pickle.load(f文件读入) 7.np.argmax(获得最大值索引) 8.np.maximum(阈值比较)
横1. np.concatenate(list, axis=0) 将数据进行串接,这里主要是可以将列表进行x轴获得y轴的串接 参数说明:list表示需要串接的列表,axis=0,表示从上到下进行串接 ...
- optim.py-使用tensorflow实现一般优化算法
optim.py Project URL:https://github.com/Codsir/optim.git Based on: tensorflow, numpy, copy, inspect ...
- 深度学习之卷积神经网络(CNN)详解与代码实现(一)
卷积神经网络(CNN)详解与代码实现 本文系作者原创,转载请注明出处:https://www.cnblogs.com/further-further-further/p/10430073.html 目 ...
- 深度学习原理与框架-卷积神经网络-cifar10分类(图片分类代码) 1.数据读入 2.模型构建 3.模型参数训练
卷积神经网络:下面要说的这个网络,由下面三层所组成 卷积网络:卷积层 + 激活层relu+ 池化层max_pool组成 神经网络:线性变化 + 激活层relu 神经网络: 线性变化(获得得分值) 代码 ...
- Pytorch1.3源码解析-第一篇
pytorch$ tree -L 1 . ├── android ├── aten ├── benchmarks ├── binaries ├── c10 ├── caffe2 ├── CITATIO ...
随机推荐
- spark编程入门-idea环境搭建
原文引自:http://blog.csdn.net/huanbia/article/details/69084895 1.环境准备 idea采用2017.3.1版本. 创建一个文件a.txt 2.构建 ...
- 8种nosql数据库对比
1. CouchDB 所用语言: Erlang 特点:DB一致性,易于使用 使用许可: Apache 协议: HTTP/REST 双向数据复制, 持续进行或临时处理, 处理时带冲突检查, 因此,采用的 ...
- Linux产生coredump文件(core)
1.可以使用命令 ulimit -c unlimited 来开启 core dump 功能,并且不限制 core dump 文件的大小: 如果需要限制文件的大小,将 unlimited 改成你想生成 ...
- 威胁快报|Solr dataimport成挖矿团伙新型利用方式
概述 近日,阿里云安全团队监测到挖矿团伙利用solr dataimport RCE(CVE-2019-0193)作为新的攻击方式对云上主机进行攻击,攻击成功后下载门罗币挖矿程序进行牟利.该团伙使用的恶 ...
- c++ 读取8, 10, 16进制数
c++基础知识都快忘了..记一下 dec-十进制(默认) oct-八进制 hex-十六进制
- springboot核心技术(二)-----自动配置原理、日志
自动配置原理 配置文件能配置的属性参照 1.自动配置原理: 1).SpringBoot启动的时候加载主配置类,开启了自动配置功能 @EnableAutoConfiguration 2).@Enable ...
- Shell 工具之 gawk
gawk 程序是 Unix 中原 awk 程序的 GNU 版本.awk 程序在流编辑方面比 sed 编辑器更先进的是:它提供了一种编程语言而不仅仅是编辑器命令行. gawk 格式 gawk optio ...
- JasperReport编译报表设计5
我们在前面的章节中产生的JasperReport模板(JRXML文件).这个文件不能直接用于生成报告.它必须被编译成JasperReport的“本地二进制"格式,称为Jasperfile.在 ...
- 使用alibaba的json工具将String类型转为JSONArray类型
转化流程:先将输入流转为String类型,再使用alibaba的json转换工具,将字符串转化为json数组 SensorDevices sensorDevices = new SensorDevic ...
- 跟我一起做一个vue的小项目(二)
这个vue项目是紧跟着之前的项目跟我一起做一个vue的小项目(一)来的. 我继续后面的开发(写的比较粗糙,边学边记录) 下图是header头部的样式 header组件内容如下 //header.vue ...