Training Very Deep Networks

Rupesh Kumar Srivastava
Klaus Greff
̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

【论文笔记】Training Very Deep Networks - Highway Networks
目标: 怎么训练很深的神经网络然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...
Deep Learning 8_深度学习UFLDL教程：Stacked Autocoders and Implement deep networks for digit classification_Exercise（斯坦福大学深度学习教程）
前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...
Initialization of deep networks
Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...
论文笔记：Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...
【DeepLearning】Exercise: Implement deep networks for digit classification
Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...
深度学习材料：从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks
In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...
Deep Networks : Overview
Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...
Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
1,概述模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...
Communication-Efficient Learning of Deep Networks from Decentralized Data
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

普及组2008NOIP 排座椅(贪心+排序）
排座椅时间限制: 1 Sec 内存限制: 50 MB提交: 4 解决: 3[提交][状态][讨论版][命题人:外部导入] 题目描述上课的时候总有一些同学和前后左右的人交头接耳,这是令小学班主任 ...
Vue.js：目标结构
ylbtech-Vue.js:目标结构 1.返回顶部 1. Vue.js 目录结构上一章节中我们使用了 npm 安装项目,我们在 IDE(Eclipse.Atom等) 中打开该目录,结构如下所示: ...
LT3756/LT3756-1/LT3756-2 - 100VIN、100VOUT LED 控制器
LT3756/LT3756-1/LT3756-2 - 100VIN.100VOUT LED 控制器特点 3000:1 True Color PWMTM调光宽输入电压范围:6V至 100V 输出电压 ...
TIMEQUEST学习之黑金动力（三）
不知不觉,学到的第四章.但是对于TQ的内部模型和外部模型的完整分析还是没有很好的理解.接着学习......... 我们也了解静态时序分析的第一步骤,亦即时钟方面的约束.此外,也稍微对 Report T ...
vue-cli脚手架config目录下index.js配置文件详解
此文章介绍vue-cli脚手架config目录下index.js配置文件此配置文件是用来定义开发环境和生产环境中所需要的参数关于注释当涉及到较复杂的解释我将通过标识的方式(如(1))将解释写到单 ...
List<T>直接充当Combox控件DataSource并扩展自定义记录的方法
一般认为List只有转换为DataTable后才能充当CombBox的数据源,其实不然: List<SYS_COMMANDS> comdList = _menuMan.Load(c =&g ...
List转Datatable 新方法
方法1,最简单的转换 DataTable dt = new DataTable(); dt.Columns.Add("id"); dt.Columns.Add("name ...
leetcode696
本题先寻找字符串中0变1,或者1变0的位置作为分隔位置.然后从这个分隔位置同时向左.右两侧搜索. 找到的左连续串和右连续串,都进行累计. public class Solution { public ...
SDL_AudioSpec（转）
转贴地址:http://www.cnblogs.com/nanguabing/archive/2012/04/12/2444084.html ============================= ...
angularJS学习（三）——搭建学习环境
1.安装Node.js 和Testacular 1.1. 安装Node.js及配置部分,在另一篇博文:node.js的安装里面讲到了,地址是:http://www.cnblogs.com/tianxu ...

Training Very Deep Networks

Training Very Deep Networks的更多相关文章

随机推荐

热门专题