Rupesh Kumar Srivastava
Klaus Greff
 ̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

  1. 【论文笔记】Training Very Deep Networks - Highway Networks

    目标: 怎么训练很深的神经网络 然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练 作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...

  2. Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)

    前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...

  3. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  4. 论文笔记:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...

  5. 【DeepLearning】Exercise: Implement deep networks for digit classification

    Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...

  6. 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks

    In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...

  7. Deep Networks : Overview

    Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...

  8. Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

    1,概述 模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...

  9. Communication-Efficient Learning of Deep Networks from Decentralized Data

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

  1. COGS 2259 异化多肽——生成函数+多项式求逆

    题目:http://cogs.pro:8080/cogs/problem/problem.php?pid=2259 详见:https://www.cnblogs.com/Zinn/p/10054569 ...

  2. bzoj 3456 城市规划——分治FFT / 多项式求逆 / 多项式求ln

    题目:https://www.lydsy.com/JudgeOnline/problem.php?id=3456 分治FFT: 设 dp[ i ] 表示 i 个点时连通的方案数. 考虑算补集:连通的方 ...

  3. Linux 简单按键中断处理流程

    中断处理程序中不能延时.休眠之类的,一定要最快速.高效的执行完. // 功能:申请中断 // 参数1:中断号码,通过宏 IRA_EINT(x) 获取 // 参数2:中断的处理函数,填函数名 // 参数 ...

  4. composer.phar的作用和安装laravel5.5.4 和 vendor目录

    composer.phar有什么作用 是 PHP 用来管理依赖(dependency)关系的工具.你可以在自己的项目中声明所依赖的外部工具库(libraries),Composer 会帮你安装这些依赖 ...

  5. 【转】Jmeter项目测试

    Jmeter的录制回放功能是现将你对要测试的项目进行访问的历史记录进行录制,然后虚拟出多个用户对历史记录进行回放,从而达到压力测试的目的. 录制是通过代理服务器进行录制. 一.下载地址 http:// ...

  6. jdk8的扩展注解

    对于注解(也被称做元数据),Java 8 主要有两点改进:类型注解和重复注解. 1.类型注解 1)Java 8 的类型注解扩展了注解使用的范围. 在java 8之前,注解只能是在声明的地方所使用,ja ...

  7. Linux - rpm 软件包管理

    rpm 是 Red-Hat Package Manager(rpm 软件包管理器)的缩写 rpm 的命名规则: 第一部分为 rpm 软件包的名称,第二部分是版本号,第三部分是版本发布次数,第四部分是软 ...

  8. left join的多重串联与groupby

    有三张表或组合查询,f1,f2,f3,其中,f1分别与f2,f3是一对多关系,f1一条记录可能对应f2或f3中0条或多条记录 要创建一个查询,以f1为基准,即f1中有多少条记录,结果也就返回对应数量的 ...

  9. PCA主成分分析 ICA独立成分分析 LDA线性判别分析 SVD性质

    机器学习(8) -- 降维 核心思想:将数据沿方差最大方向投影,数据更易于区分 简而言之:PCA算法其表现形式是降维,同时也是一种特征融合算法. 对于正交属性空间(对2维空间即为直角坐标系)中的样本点 ...

  10. Flask之自定义模型类

    4.3自定义模型类 定义模型 模型表示程序使用的数据实体,在Flask-SQLAlchemy中,模型一般是Python类,继承自db.Model,db是SQLAlchemy类的实例,代表程序使用的数据 ...