Rupesh Kumar Srivastava
Klaus Greff
 ̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

  1. 【论文笔记】Training Very Deep Networks - Highway Networks

    目标: 怎么训练很深的神经网络 然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练 作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...

  2. Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)

    前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...

  3. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  4. 论文笔记:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...

  5. 【DeepLearning】Exercise: Implement deep networks for digit classification

    Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...

  6. 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks

    In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...

  7. Deep Networks : Overview

    Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...

  8. Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

    1,概述 模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...

  9. Communication-Efficient Learning of Deep Networks from Decentralized Data

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

  1. wordpress改不了固定连接的解决办法

    经常遇到更改了固定连接刷新没效果>>>>>>>废话不多说直接上步骤,有图有真相. 1,ftp删除根目录的这个文件夹 2,进入wp后台设置-固定连接—自定义结构 ...

  2. poj 3590 The shuffle Problem——DP+置换

    题目:http://poj.org/problem?id=3590 bzoj 1025 的弱化版.大概一样的 dp . 输出方案的时候小的环靠前.不用担心 dp 时用 > 还是 >= 来转 ...

  3. Oracle记录(二) SQLPlus命令

    对于Oracle数据库操作主要使用的是命令行方式,而所有的命令都使用sqlplus完成,对于sqlplus有两种形式.就我个人而言,还是比较喜欢UNIX与Linux下的Oracle.呵呵 一种是dos ...

  4. 用xshell-ssh连接服务器被经常意外中断

    xshell突然中断报错 Socket error Event: 32 Error: 10053.Connection closing...Socket close. Connection close ...

  5. 在centos6.6中mysql5.5的编译、安装、配置

    今天根据需求要在centos6.6上编译安装mysql5.5,因为以前编译安装过感觉很简单,但是今天还是出现了点小问题,所以把过安装过程总结了一下: 好像从mysql5.5开始编译安装mysql需要用 ...

  6. 微软系统工具包Sysinternals Suite官方下载地址

    Sysinternals Suite是微软官方发布的系统工具包,其中包含数十款实用的绿色系统工具软件,个个身怀绝技,是你维护Windows系统不可或缺的好帮手. 最新版Sysinternals Sui ...

  7. 蓝桥杯 算法训练 ALGO-57 删除多余括号

    算法训练 删除多余括号   时间限制:1.0s   内存限制:512.0MB 问题描述 从键盘输入一个含有括号的四则运算表达式,要求去掉可能含有的多余的括号,结果要保持原表达式中变量和运算符的相对位置 ...

  8. jave获取音频时长

    本文转载自:http://blog.csdn.net/ntotl/article/details/50419983 下载 jave-1.0.2.jar File source =new File('d ...

  9. 1039 Course List for Student

    题意:给出K门课程(编号1~K)以及报名该课程的学生,然后有N个学生查询,对于每一个查询,输出该学生所报的相关课程编号,且要求编号按增序输出. 思路:题目不难,解析略.(本来用map直接映射,用STL ...

  10. jQueryUI Sortable 应用Demo

    最近工作用需要设计一个自由布局的页面设计.我选了jQuery UI 的 sortable ,可以拖拽,自由排序 使用很方便,写一个demo,做个记录. 第一.单项目自由排序 下图效果 代码段: < ...