Rupesh Kumar Srivastava
Klaus Greff
 ̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

  1. 【论文笔记】Training Very Deep Networks - Highway Networks

    目标: 怎么训练很深的神经网络 然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练 作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...

  2. Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)

    前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...

  3. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  4. 论文笔记:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...

  5. 【DeepLearning】Exercise: Implement deep networks for digit classification

    Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...

  6. 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks

    In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...

  7. Deep Networks : Overview

    Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...

  8. Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

    1,概述 模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...

  9. Communication-Efficient Learning of Deep Networks from Decentralized Data

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

  1. LINUX 11G RAC ASM磁盘组在线增加磁盘扩容

    LINUX 11G RAC ASM磁盘组在线增加磁盘扩容 1.操作系统版本 OEL 6.1 [root@cqltjcpt1 ~]# more /etc/redhat-release Red Hat E ...

  2. bzoj 3202 [Sdoi2013]项链——容斥+置换+推式子

    题目:https://www.lydsy.com/JudgeOnline/problem.php?id=3202 可见Zinn博客:https://www.cnblogs.com/Zinn/p/100 ...

  3. POJ2226(最小顶点覆盖)

    Muddy Fields Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 10044   Accepted: 3743 Des ...

  4. 校赛热身 Problem B. Matrix Fast Power

    找循环节,肯定在40项以内,不会证明. #include <iostream> #include <cstring> #include <string> #incl ...

  5. win10 Edge 无法上网代理服务器错误

    当连接好网络时 Edge无法上网,提示代理服务器错误,系统其他非第三方软件同样网络异常 解决:为当前所连接的网络更新自动检测 控制面板->网络和Internet->Internet选项-& ...

  6. PHP字符串的处理(三)-字符串的输出

    1.echo() echo()实际不是一个函数,是一个语言结构,不需要使用括号 <?php $str = "test"; echo $str."<br> ...

  7. C语言的第二天-比较大小的小程序

    #include <stdio.h> int main() { int a,b,c,max; printf("请输入三个数:"); scanf("%d,%d, ...

  8. springboot成神之——springboot+mybatis+mysql搭建项目简明demo

    springboot+mybatis+mysql搭建项目简明demo 项目所需目录结构 pom.xml文件配置 application.properties文件配置 MyApplication.jav ...

  9. ORACLE和MYSQL函数

    函数 编号 类别 ORACLE MYSQL 注释 1 数字函数 round(1.23456,4) round(1.23456,4) 一样: ORACLE:select round(1.23456,4) ...

  10. Linux进阶路线

    初级:熟练使用命令.熟悉Shell编程.能配置简单的服务,清楚各类服务相关的配置文件的位置, 能看懂并可修改系统提供的配置脚本(/etc/*.*)把/etc目录下面常用的配置你都搞懂,把 /bin / ...