https://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network

It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize.

https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks/31842945

In the neural network terminology:

  • one epoch = one forward pass and one backward pass of all the training examples
  • batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
  • number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/

Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples.

Overview

Batch methods, such as limited memory BFGS, which use the full training set to compute the next update to parameters at each iteration tend to converge very well to local optima. They are also straight forward to get working provided a good off the shelf implementation (e.g. minFunc) because they have very few hyper-parameters to tune. However, often in practice computing the cost and gradient for the entire training set can be very slow and sometimes intractable on a single machine if the dataset is too big to fit in main memory. Another issue with batch optimization methods is that they don’t give an easy way to incorporate new data in an ‘online’ setting. Stochastic Gradient Descent (SGD) addresses both of these issues by following the negative gradient of the objective after seeing only a single or a few training examples. The use of SGD In the neural network setting is motivated by the high cost of running back propagation over the full training set. SGD can overcome this cost and still lead to fast convergence.

Stochastic Gradient Descent

The standard gradient descent algorithm updates the parameters θθ of the objective J(θ)J(θ) as,

θ=θ−α∇θE[J(θ)]θ=θ−α∇θE[J(θ)]

where the expectation in the above equation is approximated by evaluating the cost and gradient over the full training set. Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples. The new update is given by,

θ=θ−α∇θJ(θ;x(i),y(i))θ=θ−α∇θJ(θ;x(i),y(i))

with a pair (x(i),y(i))(x(i),y(i)) from the training set.

What is the difference between iterations and epochs in Convolution neural networks?的更多相关文章

  1. Training (deep) Neural Networks Part: 1

    Training (deep) Neural Networks Part: 1 Nowadays training deep learning models have become extremely ...

  2. Online handwriting recognition using multi convolution neural networks

    w可以考虑从计算机的“机械性.重复性”特征去设计“低效的”算法. https://www.codeproject.com/articles/523074/webcontrols/ Online han ...

  3. (转)Understanding, generalisation, and transfer learning in deep neural networks

    Understanding, generalisation, and transfer learning in deep neural networks FEBRUARY 27, 2017   Thi ...

  4. [C6] Andrew Ng - Convolutional Neural Networks

    About this Course This course will teach you how to build convolutional neural networks and apply it ...

  5. 2017年计算语义相似度最新论文,击败了siamese lstm,非监督学习

    Page 1Published as a conference paper at ICLR 2017AS IMPLE BUT T OUGH - TO -B EAT B ASELINE FOR S EN ...

  6. FITTING A MODEL VIA CLOSED-FORM EQUATIONS VS. GRADIENT DESCENT VS STOCHASTIC GRADIENT DESCENT VS MINI-BATCH LEARNING. WHAT IS THE DIFFERENCE?

    FITTING A MODEL VIA CLOSED-FORM EQUATIONS VS. GRADIENT DESCENT VS STOCHASTIC GRADIENT DESCENT VS MIN ...

  7. (转)The AlphaGo Replication Wiki

    The AlphaGo Replication Wiki 摘自:https://github.com/Rochester-NRT/RocAlphaGo/wiki/01.-Home Contents : ...

  8. (转)分布式深度学习系统构建 简介 Distributed Deep Learning

    HOME ABOUT CONTACT SUBSCRIBE VIA RSS   DEEP LEARNING FOR ENTERPRISE Distributed Deep Learning, Part ...

  9. CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając

    CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając Dr. ...

随机推荐

  1. STL 源码分析 (SGI版本, 侯捷著)

    前言 源码之前,了无秘密 algorithm的重要性 效率的重要性 采用Cygnus C++ 2.91 for windows cygwin-b20.1-full2.exe 下载地址:http://d ...

  2. 对中级 Linux 用户非常有用的 20 个命令

    FROM:http://www.oschina.net/translate/20-advanced-commands-for-middle-level-linux-users 21. 命令: Find ...

  3. 转:Spring mvc中@RequestMapping 6个基本用法小结

    Spring mvc中@RequestMapping 6个基本用法小结 发表于3年前(2013-02-17 19:58)   阅读(11698) | 评论(1) 13人收藏此文章, 我要收藏 赞3 4 ...

  4. WIN7 安装其他的系统boot

    1,安装win7系统不赘述. 2,在安装完win7系统后,准备安装CentOS7.0 3,准备ISO文件和所需软件 1.CentOS官网下载DVD ISO文件​ 一般选择DVD ISO ​     2 ...

  5. Android使用TextView,设置onClick属性无效解决的方法

    Android在布局文件里为View提供了onClick属性.用法例如以下: <TextView android:id="@+id/user" android:layout_ ...

  6. UI_UITableView_搭建

    创建 tableView UITableViewStyle 有两种选择 #pragma mark - 创建 tableView - (void)createTableView { // 枚举类型共同拥 ...

  7. 如何安装Android SDK Emulator

    1 下载并安装JDK,可以到官方网站寻找自己的对应版本下载 http://www.oracle.com/technetwork/java/javase/downloads/jdk-7u1-downlo ...

  8. Java Applet 基础

    Java Applet 基础 Applet 是一种 Java 程序.它一般运行在支持 Java 的 Web 浏览器内.因为它有完整的 Java API支持,所以Applet 是一个全功能的 Java ...

  9. 最美应用-从Android研发project师的角度之[最美时光]

    最美应用-从Android研发project师的角度之最美时光 @author ASCE1885的 Github 简书 微博 CSDN 近期发现最美应用这样一个站点.它会定期推介一些非常有意思的app ...

  10. apue学习笔记(第三章 文件I/O)

    本章开始讨论UNIX系统,先说明可用的文件I/O函数---打开文件.读写文件等 UNIX系统中的大多数文件I/O只需用到5个函数:open.read.write.lseek以及close open函数 ...