Optimization Algorithms
1. Stochastic Gradient Descent

2. SGD With Momentum
Stochastic gradient descent with momentum remembers the update Δ w at each iteration, and determines the next update as a linear combination of the gradient and the previous update:



Unlike in classical stochastic gradient descent, it tends to keep traveling in the same direction, preventing oscillations.
3. RMSProp
RMSProp (for Root Mean Square Propagation) is also a method in which the learning rate is adapted for each of the parameters. The idea is to divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight. So, first the running average is calculated in terms of means square,

where, is the forgetting factor.
And the parameters are updated as,

RMSProp has shown excellent adaptation of learning rate in different applications. RMSProp can be seen as a generalization of Rprop and is capable to work with mini-batches as well opposed to only full-batches.
4. The Adam Algorithm
Adam (short for Adaptive Moment Estimation) is an update to the RMSProp optimizer. In this optimization algorithm, running averages of both the gradients and the second moments of the gradients are used. Given parameters and a loss function
, where
indexes the current training iteration (indexed at
), Adam's parameter update is given by:





where is a small number used to prevent division by 0, and
and
are the forgetting factors for gradients and second moments of gradients, respectively.
参考链接:Wikipedia。
Optimization Algorithms的更多相关文章
- (转) An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Table of contents: Gradient descent variants ...
- An overview of gradient descent optimization algorithms
原文地址:An overview of gradient descent optimization algorithms An overview of gradient descent optimiz ...
- 课程二(Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization),第二周(Optimization algorithms) —— 2.Programming assignments:Optimization
Optimization Welcome to the optimization's programming assignment of the hyper-parameters tuning spe ...
- 优化算法动画演示Alec Radford's animations for optimization algorithms
Alec Radford has created some great animations comparing optimization algorithms SGD, Momentum, NAG, ...
- [C2W2] Improving Deep Neural Networks : Optimization algorithms
第二周:优化算法(Optimization algorithms) Mini-batch 梯度下降(Mini-batch gradient descent) 本周将学习优化算法,这能让你的神经网络运行 ...
- 【论文翻译】An overiview of gradient descent optimization algorithms
这篇论文最早是一篇2016年1月16日发表在Sebastian Ruder的博客.本文主要工作是对这篇论文与李宏毅课程相关的核心部分进行翻译. 论文全文翻译: An overview of gradi ...
- Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week2, Optimization algorithms
Gradient descent Batch Gradient Decent, Mini-batch gradient descent, Stochastic gradient descent 还有很 ...
- An overview of gradient descent optimization algorithms (更新到Adam)
Momentum:解快了收敛速度,同时也减弱了SGD的波动 NAG: 减速了Momentum更新参数太快 Adagrad: 出现频率较低参数采用较大的更新,对于出现频率较高的参数采用较小的,不共用一个 ...
- 最佳化常用测试函数 Optimization Test functions
http://www.sfu.ca/~ssurjano/optimization.html The functions listed below are some of the common func ...
随机推荐
- Django-DRF(视图相关)
drf除了在数据序列化部分简写代码以外,还在视图中提供了简写操作.所以在django原有的django.views.View类基础上,drf封装了多个子类出来提供给我们使用. Django REST ...
- Leetcode之动态规划(DP)专题-188. 买卖股票的最佳时机 IV(Best Time to Buy and Sell Stock IV)
Leetcode之动态规划(DP)专题-188. 买卖股票的最佳时机 IV(Best Time to Buy and Sell Stock IV) 股票问题: 121. 买卖股票的最佳时机 122. ...
- linux-32bit-内存管理
一.进程与内存 进程如何使用内存? 毫无疑问所有进程(执行的程序)都必须占用一定数量的内存,它或是用来存放从磁盘载入的程序代码,或是存放取自用户输入的数据等等.不过进程对这些内存的管理方式因内存用途不 ...
- [转帖]字符编码笔记:ASCII,Unicode 和 UTF-8
字符编码笔记:ASCII,Unicode 和 UTF-8 http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html 转帖 ...
- css动画(transition/transform/animation)
在开发中,一个好的用户操作界面,总会夹杂着一些动画.css用对少的代码,来给用户最佳的体验感,下面我总结了一些css动画属性的使用方法及用例代码供大家参考,在不对的地方,希望大佬直接拍砖评论. 1 t ...
- MySQL线程池(THREAD POOL)的原理
MySQL常用(目前线上使用)的线程调度方式是one-thread-per-connection(每连接一个线程),server为每一个连接创建一个线程来服务,连接断开后,这个线程进入thread_c ...
- POJ2367(拓扑排序裸题
#include<iostream> #include<vector> #include<queue> using namespace std; typedef l ...
- LinqToSQL3
Lambda Lambda表达式和匿名方法很相似,但Lambda表达式比匿名方法更灵活,并且语法比匿名方法更简洁. 在LINQ中可以使用Lambda表达式创建委托,这些委托将稍后执行查询时被调用. L ...
- JS基础_一元运算符
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...
- centos 配置rsync+inotify数据实时同步2
一.Rsync服务简介 1. 什么是Rsync 它是一个远程数据同步工具,它在同步文件的同时,可通过LAN/WAN快速同步多台主机间的文件.Rsync使用所谓的“rsync算法”来使本地和远程两个主机 ...