Rupesh Kumar Srivastava
Klaus Greff
 ̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

  1. 【论文笔记】Training Very Deep Networks - Highway Networks

    目标: 怎么训练很深的神经网络 然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练 作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...

  2. Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)

    前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...

  3. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  4. 论文笔记:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...

  5. 【DeepLearning】Exercise: Implement deep networks for digit classification

    Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...

  6. 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks

    In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...

  7. Deep Networks : Overview

    Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...

  8. Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

    1,概述 模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...

  9. Communication-Efficient Learning of Deep Networks from Decentralized Data

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

  1. 自定义标签2.x

    2.x只需要继承SimpleTagSupport 1.x 输出流  JspWriter out = pageContext.getOut(); 2.x 输出流  JspWriter out = get ...

  2. Mac eclipse 连接 手机调试

    Mac eclipse 连接 手机调试 更新:2014-11-10 20:13 1 2 3 4 5 6 分步阅读 很多Android程序员 用Mac 来开发.但是Mac下eclipse连接 手机存在一 ...

  3. (转)Android 中LocalBroadcastManager的使用方式

    发表于2个月前(2014-11-03 22:05)   阅读(37) | 评论(0) 0人收藏此文章, 我要收藏 赞0 1月10日 #长沙# OSC 源创会第32期开始报名 摘要 android中广播 ...

  4. webstorm设置修改文件后自动编译并刷新浏览器页面

    转载:http://www.cnblogs.com/ssrsblogs/p/6155747.html 重装了 webstorm ,从10升级到了2016 一升不要紧,打开老项目,开启webpakc-d ...

  5. POJ2478(欧拉函数)

    Farey Sequence Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 15242   Accepted: 6054 D ...

  6. (转)pipe row的用法, Oracle split 函数写法.

    本文转载自:http://www.cnblogs.com/newsea/archive/2010/12/14/1905482.html 关于 pipe row的用法2009/12/30 14:53 = ...

  7. Oracle常见的表连接的方法

    1 排序合并连接SMJ Sort merge join 排序合并总结: 1 通常情况下,排序合并连接的效率远不如hash join,前者适用范围更广,hj只使用于等值连接,smj范围更广(<,& ...

  8. MVC中Ajax post 和Ajax Get——提交对象Model

    HTTP 请求:GET vs. POST两种在客户端和服务器端进行请求-响应的常用方法是:GET 和 POST.GET - 从指定的资源请求数据POST - 向指定的资源提交要处理的数据GET 基本上 ...

  9. timequest学习之黑金动力(一)

    黑金动力的资料还是非常有价值的.通过建模篇,对于给定的时序关系,我总能实现.但是,这总是很初级的能力.也只是为后面的建模服务.所以,现阶段我的能力还是非常有限.我相信我一定会成为牛人,能够独挡一面.借 ...

  10. 【转】Context.getExternalFilesDir()和Context.getExternalCacheDir()方法

    应用程序在运行的过程中如果需要向手机上保存数据,一般是把数据保存在SDcard中的.大部分应用是直接在SDCard的根目录下创建一个文件夹,然后把数据保存在该文件夹中.这样当该应用被卸载后,这些数据还 ...