Training Very Deep Networks

Rupesh Kumar Srivastava
Klaus Greff
̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

【论文笔记】Training Very Deep Networks - Highway Networks
目标: 怎么训练很深的神经网络然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...
Deep Learning 8_深度学习UFLDL教程：Stacked Autocoders and Implement deep networks for digit classification_Exercise（斯坦福大学深度学习教程）
前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...
Initialization of deep networks
Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...
论文笔记：Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...
【DeepLearning】Exercise: Implement deep networks for digit classification
Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...
深度学习材料：从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks
In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...
Deep Networks : Overview
Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...
Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
1,概述模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...
Communication-Efficient Learning of Deep Networks from Decentralized Data
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

VS下QT的自定义槽函数修改方法
通过几天的摸索,基本发现了两个VS的槽函数的修改方法一种是UI是通过UI 设计师拖出来的,类似VB的方法,通过拖的方法,按钮的代码是系统自动生成的,在UI.h的头文件下,这个时候,实现槽函数有个固定 ...
java 执行bat批处理文件并关闭cmd窗口
java 执行bat批处理文件并关闭cmd窗口 import java.io.IOException; public class CmdMain { public static void main( ...
【转】Jmeter的正则表达式未正确提取数据
在进行脚本调试时,在Apply-Money-Page中需要Save-base中header的id参数,采用正则表达式提取器获取使用正则表达式提取器,结果无法获取到需要的参数最后定位是因为[?]是一 ...
java代码初学者适用，输入学生成绩，符合要求的过~~~~注意数据范围
总结:没有基础,我从点滴开始, package com.aaa; import java.util.Scanner; //输入“repate ”次数,输入学生成绩,低于60分,输出fail.否则输入p ...
Kata 架构
原文:https://github.com/kata-containers/documentation/blob/master/architecture.md (欢迎纠错) Kata-runtime ...
C#泛型参数多线程与复杂参数多线程
背景:最近用多线程用的比较多自己走了一些弯路,分享出来希望大家少走弯路,C#中的多线程有两个重载,一个是不带参数的,一个是带参数的,但是即便是带参数的多线程也不支持泛型,这使得使用泛型参数多线程的时候 ...
spring+hibernate ---含AOP--事务--laobai
biz包: package com.etc.biz; import java.util.List; import org.springframework.orm.hibernate3.support. ...
Patator-一款很好用的爆破工具
项目地址:https://github.com/lanjelot/patator 打开文件夹运行一下文件查看帮助 python patator.py --help 这里有很多的爆破选项,就不一一截图 ...
X11 转发
SecureCRT只支持字符界面,如果要在终端界面能弹出GUI,需要本地安装X11 server,然后服务器讲X11请求forward到本地,即可. x11 server 可以使用Xming, Xma ...
02-17 位图验证码（一般处理程序）+AJAX
建立一个空网站,在设计界面工具箱中拖入一个TextBox工具,一个按钮,外加一个Image图片工具(充当数字.字母以图片形式).但是这样做出来的验证码会出现一个问题,每当点击一下按钮,界面自动提交一遍 ...

Training Very Deep Networks

Training Very Deep Networks的更多相关文章

随机推荐

热门专题