DTU DeepLearning: exercise 6

Does batch_size have any affects in results quality? how to set the optimal batch size and number of iterations?

1. the batch size, a : the number of training data feed into neural network once.

2. the number of iterations, b : how many time we should feed into NN.

a*b = the number of training examples.

Batch Gradient Descent. Batch Size = Size of Training Set
Stochastic Gradient Descent. Batch Size = 1
Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set

from here. The "sample size" you're talking about is referred to as batch size, BB. The batch size parameter is just one of the hyper-parameters you'll be tuning when you train a neural network with mini-batch Stochastic Gradient Descent (SGD) and is data dependent. The most basic method of hyper-parameter search is to do a grid search over the learning rate and batch size to find a pair which makes the network converge.

To understand what the batch size should be, it's important to see the relationship between batch gradient descent, online SGD, and mini-batch SGD. Here's the general formula for the weight update step in mini-batch SGD, which is a generalization of all three types. [2]

Batch gradient descent, B=|x|
Online stochastic gradient descent: B=1
Mini-batch stochastic gradient descent: B>1 but B<|x|.

Note that with 1, the loss function is no longer a random variable and is not a stochastic approximation.

SGD converges faster than normal "batch" gradient descent because it updates the weights after looking at a randomly selected subset of the training set. Let x be our training set and let m⊂x. The batch size B is just the cardinality of m: B=|m|.

Batch gradient descent updates the weights θ using the gradients of the entire dataset x; whereas SGD updates the weights using an average of the gradients for a mini-batch m. (Using the average as opposed to a sum prevents the algorithm from taking steps that are too large if the dataset is very large. Otherwise, you would need to adjust your learning rate based on the size of the dataset.) The expected value of this stochastic approximation of the gradient used in SGD is equal to the deterministic gradient used in batch gradient descent. E[∇LSGD(θ,m)]=∇L(θ,x).

Each time we take a sample and update our weights it is called a mini-batch. Each time we run through the entire dataset, it's called an epoch.

Let's say that we have some data vector x:RD, an initial weight vector that parameterizes our neural network, θ0:RSθ0:RS, and a loss function L(θ,x):RS→RD→RS that we are trying to minimize. If we have TT training examples and a batch size of B, then we can split those training examples into C mini-batches:

C=⌈T/B⌉

For simplicity we can assume that T is evenly divisible by B. Although, when this is not the case, as it often is not, proper weight should be assigned to each mini-batch as a function of its size.

An iterative algorithm for SGD with MM epochs is given below:

Note: in real life we're reading these training example data from memory and, due to cache pre-fetching and other memory tricks done by your computer, your algorithm will run faster if the memory accesses are coalesced, i.e. when you read the memory in order and don't jump around randomly. So, most SGD implementations shuffle the dataset and then load the examples into memory in the order that they'll be read.

The major parameters for the vanilla (no momentum) SGD described above are:

Learning Rate: ϵ

I like to think of epsilon as a function from the epoch count to a learning rate. This function is called the learning rate schedule.

ϵ(t):N→R

If you want to have the learning rate fixed, just define epsilon as a constant function.

Batch Size

Batch size determines how many examples you look at before making a weight update. The lower it is, the noisier the training signal is going to be, the higher it is, the longer it will take to compute the gradient for each step.

Citations & Further Reading:

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d

from here: when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize. When you put m examples in a minibatch, you need to do O(m) computation and use O(m) memory, but you reduce the amount of uncertainty in the gradient by a factor of only O(sqrt(m)).

DTU DeepLearning: exercise 6的更多相关文章

DTU DeepLearning: exercise 7
torch activation functions: sigmoid, relu, tanh, softplus. https://morvanzhou.github.io/tutorials/ma ...
【DeepLearning】Exercise:Softmax Regression
Exercise:Softmax Regression 习题的链接:Exercise:Softmax Regression softmaxCost.m function [cost, grad] = ...
【DeepLearning】Exercise:Convolution and Pooling
Exercise:Convolution and Pooling 习题链接:Exercise:Convolution and Pooling cnnExercise.m %% CS294A/CS294 ...
【DeepLearning】Exercise:Learning color features with Sparse Autoencoders
Exercise:Learning color features with Sparse Autoencoders 习题链接:Exercise:Learning color features with ...
【DeepLearning】Exercise: Implement deep networks for digit classification
Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...
【DeepLearning】Exercise:Self-Taught Learning
Exercise:Self-Taught Learning 习题链接:Exercise:Self-Taught Learning feedForwardAutoencoder.m function [ ...
【DeepLearning】Exercise:PCA and Whitening
Exercise:PCA and Whitening 习题链接:Exercise:PCA and Whitening pca_gen.m %%============================= ...
【DeepLearning】Exercise:PCA in 2D
Exercise:PCA in 2D 习题的链接:Exercise:PCA in 2D pca_2d.m close all %%=================================== ...
【DeepLearning】Exercise:Vectorization
Exercise:Vectorization 习题的链接:Exercise:Vectorization 注意点: MNIST图片的像素点已经经过归一化. 如果再使用Exercise:Sparse Au ...

随机推荐

Go并发模式代码示例
演讲稿:Go Concurrency Patterns Youtube视频作者:Rob Pike 练习题目:谷歌搜索:一个虚拟框架谷歌搜索1.0 PPT从43页开始:https://talks.g ...
lucas定理模板
lucas定理 (nm)&VeryThinSpace;mod&VeryThinSpace;p=(⌊np⌋⌊mp⌋)(n&VeryThinSpace;mod&VeryTh ...
opencv —— addWeighted 图像叠加（计算数组加权和）
计算数组加权和:addWeighted 可实现两个大小.类型均相同的数组(一般为 Mat 类型)按照设定权重叠加在一起. void addWeighted(InputArray src1,double ...
pyqt5-字体，颜色选择对话框设置label标签字体颜色样式
1.采用实例方法,先创建2个dialog对象,采用该对象的信号触发相应的操作 import sys from PyQt5.Qt import * class MyWidget(QWidget): de ...
ng-http
启用 Http 服务 open the root AppModule, import the HttpClientModule symbol from @angular/common/http, ad ...
Spark学习之路（二十三）SparkStreaming的官方文档[转]
SparkCore.SparkSQL和SparkStreaming的类似之处 SparkStreaming的运行流程 1.我们在集群中的其中一台机器上提交我们的Application Jar,然后就会 ...
rownum按某字段排序查询
特点:rownum伪列,查询结果按顺序从1递增排列用途:按某字段排序查询第几名到第几名的数据但加上按字段排序条件后,rownum并不会从1递增需把按字段排序查询的数据作为一张表,再次查询,row ...
gulp常用插件之autoprefixer使用
更多gulp常用插件使用请访问:gulp常用插件汇总 autoprefixer这是一款自动管理浏览器前缀的插件,它可以解析CSS文件并且添加浏览器前缀到CSS内容里. 更多使用文档请点击访问autop ...
python3包、模块、类、方法的认识
包>>模块>>类>> 函数包:就是一个目录,import time from+import导入包中的部分模块直接到类 from budaoguan.common ...
springboot中返回值json中null转换空字符串
在实际项目中,我们难免会遇到一些无值.当我们转JSON时,不希望这些null出现,比如我们期望所有的null在转JSON时都变成“”“”这种空字符串,那怎么做呢? Jackson中对null的处理 @ ...

DTU DeepLearning: exercise 6

DTU DeepLearning: exercise 6的更多相关文章

随机推荐

热门专题