DTU DeepLearning: exercise 6
- Batch Gradient Descent. Batch Size = Size of Training Set
- Stochastic Gradient Descent. Batch Size = 1
- Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set
from here. The "sample size" you're talking about is referred to as batch size, BB. The batch size parameter is just one of the hyper-parameters you'll be tuning when you train a neural network with mini-batch Stochastic Gradient Descent (SGD) and is data dependent. The most basic method of hyper-parameter search is to do a grid search over the learning rate and batch size to find a pair which makes the network converge.
To understand what the batch size should be, it's important to see the relationship between batch gradient descent, online SGD, and mini-batch SGD. Here's the general formula for the weight update step in mini-batch SGD, which is a generalization of all three types. [2]
- Batch gradient descent, B=|x|
- Online stochastic gradient descent: B=1
- Mini-batch stochastic gradient descent: B>1 but B<|x|.
Note that with 1, the loss function is no longer a random variable and is not a stochastic approximation.
SGD converges faster than normal "batch" gradient descent because it updates the weights after looking at a randomly selected subset of the training set. Let x be our training set and let m⊂x. The batch size B is just the cardinality of m: B=|m|.
Batch gradient descent updates the weights θ using the gradients of the entire dataset x; whereas SGD updates the weights using an average of the gradients for a mini-batch m. (Using the average as opposed to a sum prevents the algorithm from taking steps that are too large if the dataset is very large. Otherwise, you would need to adjust your learning rate based on the size of the dataset.) The expected value of this stochastic approximation of the gradient used in SGD is equal to the deterministic gradient used in batch gradient descent. E[∇LSGD(θ,m)]=∇L(θ,x).
Each time we take a sample and update our weights it is called a mini-batch. Each time we run through the entire dataset, it's called an epoch.
Let's say that we have some data vector x:RD, an initial weight vector that parameterizes our neural network, θ0:RSθ0:RS, and a loss function L(θ,x):RS→RD→RS that we are trying to minimize. If we have TT training examples and a batch size of B, then we can split those training examples into C mini-batches:
For simplicity we can assume that T is evenly divisible by B. Although, when this is not the case, as it often is not, proper weight should be assigned to each mini-batch as a function of its size.
An iterative algorithm for SGD with MM epochs is given below:
Note: in real life we're reading these training example data from memory and, due to cache pre-fetching and other memory tricks done by your computer, your algorithm will run faster if the memory accesses are coalesced, i.e. when you read the memory in order and don't jump around randomly. So, most SGD implementations shuffle the dataset and then load the examples into memory in the order that they'll be read.
The major parameters for the vanilla (no momentum) SGD described above are:
- Learning Rate: ϵ
I like to think of epsilon as a function from the epoch count to a learning rate. This function is called the learning rate schedule.
If you want to have the learning rate fixed, just define epsilon as a constant function.
- Batch Size
Batch size determines how many examples you look at before making a weight update. The lower it is, the noisier the training signal is going to be, the higher it is, the longer it will take to compute the gradient for each step.
Citations & Further Reading:
DTU DeepLearning: exercise 6的更多相关文章
- DTU DeepLearning: exercise 7
torch activation functions: sigmoid, relu, tanh, softplus. https://morvanzhou.github.io/tutorials/ma ...
- 【DeepLearning】Exercise:Softmax Regression
Exercise:Softmax Regression 习题的链接:Exercise:Softmax Regression softmaxCost.m function [cost, grad] = ...
- 【DeepLearning】Exercise:Convolution and Pooling
Exercise:Convolution and Pooling 习题链接:Exercise:Convolution and Pooling cnnExercise.m %% CS294A/CS294 ...
- 【DeepLearning】Exercise:Learning color features with Sparse Autoencoders
Exercise:Learning color features with Sparse Autoencoders 习题链接:Exercise:Learning color features with ...
- 【DeepLearning】Exercise: Implement deep networks for digit classification
Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...
- 【DeepLearning】Exercise:Self-Taught Learning
Exercise:Self-Taught Learning 习题链接:Exercise:Self-Taught Learning feedForwardAutoencoder.m function [ ...
- 【DeepLearning】Exercise:PCA and Whitening
Exercise:PCA and Whitening 习题链接:Exercise:PCA and Whitening pca_gen.m %%============================= ...
- 【DeepLearning】Exercise:PCA in 2D
Exercise:PCA in 2D 习题的链接:Exercise:PCA in 2D pca_2d.m close all %%=================================== ...
- 【DeepLearning】Exercise:Vectorization
Exercise:Vectorization 习题的链接:Exercise:Vectorization 注意点: MNIST图片的像素点已经经过归一化. 如果再使用Exercise:Sparse Au ...
随机推荐
- webpack安装jQuery报错
使用webpack搭建项目,并使用了node下载了jQuery使用,使用命令行完成构建时发现报错了, ERROR in ./node_modules/jquery/lib/node-jquery.js ...
- mac 电脑画图软件相关
sketchbook 免费但是不太好用 sketch, https://www.newasp.net/soft/327640.html 注意:安装前,请开启任何来源.OS X 10.12 及以上版本请 ...
- ImportError: DLL load failed with error code -1073741795
Win7,python3.6,pip安装tensorflow之后报错: >>> import tensorflow Traceback (most recent call last) ...
- 学习linux/unix编程方法的建议,学习Linux的四个步骤(转)
解答:学习Linux的四个步骤假设你是计算机科班出身,计算机系的基本课程如数据结构.操作系统.体系结构.编译原理.计算机网络你全修过我想大概可以分为4个阶段,水平从低到高从安装使用=>linux ...
- Hadoop学习之路(9)ZooKeeper安装
文章目录 1.环境准备 1.1下载zooKeeper 1.3安装zooKeeper 1.4配置zooKeeper环境变量 1.5 修改zookeeper集群配置文件 1.6 创建myid文件 1.7 ...
- CTF长久练习平台
0x01 XCTF(攻防世界) 攻防世界是ctf爱好者很喜欢的一个平台,不仅是界面风格像大型游戏闯关,里面的各类题目涵盖的ctf题型很广,还分为新手区和进阶区两块: 并且可以在里面组队,做一道题还有相 ...
- Tomcat 后台war部署上传shell
tomcat的后台登录的两个目录为: /admin /manager/html 如果版本过高,只有采用弱密码的方式进后台: 有些tomcat采用默认的用户名和密码(用户名:admin,密码:空): 或 ...
- 风变编程笔记(一)-Python基础语法
第0关 print()函数与变量 1. print()函数print()函数:告诉计算机,把括号的内容显示在屏幕上 # 不带引号 print(1+1) # 让计算机读懂括号里的内容,打印最终的结果 ...
- Git分支规范说明
1.分支类型说明 分支名称 分支描述 唯一 权限管理 release 发布分支,内部分支,当确定需要发布版本时,从develop分支拉出此分支 唯一 最高权限,由版本经理或者团队核心成员组管理 mas ...
- Python入门10 —— for循环
1.字符串依次取值 students = ['egon', 'lxx', 'alex'] i = 0 while i < 3: print(students[i]) i += 1 2.针对循环取 ...