This instability is a fundamental problem for gradient-based learning in deep neural networks. vanishing exploding gradient problem
The unstable gradient problem: The fundamental problem here isn't so much the vanishing gradient problem or the exploding gradient problem. It's that the gradient in early layers is the product of terms from all the later layers. When there are many layers, that's an intrinsically unstable situation. The only way all layers can learn at close to the same speed is if all those products of terms come close to balancing out. Without some mechanism or underlying reason for that balancing to occur, it's highly unlikely to happen simply by chance. In short, the real problem here is that neural networks suffer from an unstable gradient problem. As a result, if we use standard gradient-based learning techniques, different layers in the network will tend to learn at wildly different speeds.

Again, early hidden layers learn much more slowly than later hidden layers. In this case, the first hidden layer is learning roughly 100 times slower than the final hidden layer. No wonder we were having trouble training these networks earlier!
We have here an important observation: in at least some deep neural networks, the gradient tends to get smaller as we move backward through the hidden layers. This means that neurons in the earlier layers learn much more slowly than neurons in later layers. And while we've seen this in just a single network, there are fundamental reasons why this happens in many neural networks. The phenomenon is known as the vanishing gradient problem**See Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, by Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber (2001). This paper studied recurrent neural nets, but the essential phenomenon is the same as in the feedforward networks we are studying. See also Sepp Hochreiter's earlier Diploma Thesis,Untersuchungen zu dynamischen neuronalen Netzen (1991, in German)..
Why does the vanishing gradient problem occur? Are there ways we can avoid it? And how should we deal with it in training deep neural networks? In fact, we'll learn shortly that it's not inevitable, although the alternative is not very attractive, either: sometimes the gradient gets much larger in earlier layers! This is the exploding gradient problem, and it's not much better news than the vanishing gradient problem. More generally, it turns out that the gradient in deep neural networks is unstable, tending to either explode or vanish in earlier layers. This instability is a fundamental problem for gradient-based learning in deep neural networks. It's something we need to understand, and, if possible, take steps to address.
One response to vanishing (or unstable) gradients is to wonder if they're really such a problem. Momentarily stepping away from neural nets, imagine we were trying to numerically minimize a function f(x)f(x) of a single variable. Wouldn't it be good news if the derivative f′(x)f′(x) was small? Wouldn't that mean we were already near an extremum? In a similar way, might the small gradient in early layers of a deep network mean that we don't need to do much adjustment of the weights and biases?
http://neuralnetworksanddeeplearning.com/chap5.html
Design a Gradient Logo Illustrator Tutorial - YouTube

This instability is a fundamental problem for gradient-based learning in deep neural networks. vanishing exploding gradient problem的更多相关文章
- Exploring Adversarial Attack in Spiking Neural Networks with Spike-Compatible Gradient
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! arXiv:2001.01587v1 [cs.NE] 1 Jan 2020 Abstract 脉冲神经网络(SNN)被广泛应用于神经形态设 ...
- Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week1, Assignment(Gradient Checking)
声明:所有内容来自coursera,作为个人学习笔记记录在这里. Gradient Checking Welcome to the final assignment for this week! In ...
- 课程二(Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization),第一周(Practical aspects of Deep Learning) —— 4.Programming assignments:Gradient Checking
Gradient Checking Welcome to this week's third programming assignment! You will be implementing grad ...
- [CS231n-CNN] Training Neural Networks Part 1 : activation functions, weight initialization, gradient flow, batch normalization | babysitting the learning process, hyperparameter optimization
课程主页:http://cs231n.stanford.edu/ Introduction to neural networks -Training Neural Network ________ ...
- 梯度消失(vanishing gradient)与梯度爆炸(exploding gradient)问题
(1)梯度不稳定问题: 什么是梯度不稳定问题:深度神经网络中的梯度不稳定性,前面层中的梯度或会消失,或会爆炸. 原因:前面层上的梯度是来自于后面层上梯度的乘乘积.当存在过多的层次时,就出现了内在本质上 ...
- 覆盖问题:最大覆盖问题(Maximum Covering Location Problem,MCLP)和集覆盖问题(Location Set Covering Problem,LSCP)
集覆盖问题研究满足覆盖所有需求点顾客的前提下,服务站总的建站个数或建 设费用最小的问题.集覆盖问题最早是由 Roth和 Toregas等提出的,用于解决消防中心和救护车等的应急服务设施的选址问题,他们 ...
- Codeforces Round #602 (Div. 2, based on Technocup 2020 Elimination Round 3) A. Math Problem 水题
A. Math Problem Your math teacher gave you the following problem: There are n segments on the x-axis ...
- Deep Learning专栏--强化学习之从 Policy Gradient 到 A3C(3)
在之前的强化学习文章里,我们讲到了经典的MDP模型来描述强化学习,其解法包括value iteration和policy iteration,这类经典解法基于已知的转移概率矩阵P,而在实际应用中,我们 ...
- 随机梯度下降(Stochastic gradient descent)和 批量梯度下降(Batch gradient descent )的公式对比、实现对比[转]
梯度下降(GD)是最小化风险函数.损失函数的一种常用方法,随机梯度下降和批量梯度下降是两种迭代求解思路,下面从公式和实现的角度对两者进行分析,如有哪个方面写的不对,希望网友纠正. 下面的h(x)是要拟 ...
随机推荐
- ES,ZK,Mysql相关参数优化
1.ES 内存调优: vi config/jvm.options -Xms16g -Xmx16g 2.Zookeeper参数配置调优 2.1\在conf目录下 vi java.env export J ...
- 通过脚本发送zabbix邮件报警
zabbix原生的报警媒介类型中,邮件报警是我们常用的方式.当我们在CentOS6上面安装zabbix3.0并配置邮件报警的时候,在邮件配置正确的前提下,不管触发器如何触发,邮件总是发送不出去,但是在 ...
- P6 EPPM 16 R1安装和配置文档
白桃花心木P6企业项目组合管理文档库 描述 链接 下载 零件号 16 R1用户和集成文档 查看库 下载 E68199-01 16 R1安装和配置文档 查看库 下载 E68198-01 描述 链接 ...
- MockServer的测试思想与实现
转载:http://blog.csdn.net/shen1936/article/details/50298901 背景 什么是MOCK Mock的定义 Mock框架简介 Mock在单测中的应用 De ...
- 关于ng-router嵌套使用和总结
那是某个下午的review代码的过程.js中有一段html,像是这样. var html = '<div>...此处还有很多html代码....</div>' 我的同事想我提出 ...
- Solidworks如何为装配体绘制剖面视图
1 如图所示的工程图来自装配体 2 点击剖面视图,随后绘制一条线(我从正中劈开),弹出对话框,勾选自动打剖面线,确定 3 剖面视图绘制完毕 三个剖视图如下 关于半剖视图,可以这样做.先 ...
- C++ 11 可变模板参数的两种展开方式
#include <iostream> #include <string> #include <stdint.h> template<typename T&g ...
- 单点登录cas常见问题(四) - ticket有哪些存储方式?
配置文件ticketRegistry.xml负责配置ticket的存储方式,registry是注冊表,登记薄的意思 经常使用的存储方式包含 1.DefaultTicketRegistry:默认的.存储 ...
- Unity3d修炼之路:游戏开发中,3d数学知识的练习【1】(不断更新.......)
#pragma strict public var m_pA : Vector3 = new Vector3(2.0f, 4.0f, 0.0f); public var m_pB : Vector3 ...
- 【BZOJ4008】【HNOI2015】亚瑟王 概率DP
链接: #include <stdio.h> int main() { puts("转载请注明出处[辗转山河弋流歌 by 空灰冰魂]谢谢"); puts("网 ...