Rupesh Kumar Srivastava
Klaus Greff
 ̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

  1. 【论文笔记】Training Very Deep Networks - Highway Networks

    目标: 怎么训练很深的神经网络 然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练 作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...

  2. Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)

    前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...

  3. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  4. 论文笔记:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...

  5. 【DeepLearning】Exercise: Implement deep networks for digit classification

    Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...

  6. 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks

    In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...

  7. Deep Networks : Overview

    Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...

  8. Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

    1,概述 模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...

  9. Communication-Efficient Learning of Deep Networks from Decentralized Data

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

  1. 洛谷 1365 WJMZBMR打osu! / Easy

    题目:https://www.luogu.org/problemnew/show/P1365 大水题.记录一下o的期望长度. 关键是(x+1)^2=x^2+2*x+1. #include<ios ...

  2. 通过docker构建zabbix监控系统

    下载zabbix的镜像 $ docker pull berngp/docker-zabbix Using default tag: latest latest: Pulling from berngp ...

  3. Qt开发环境下载和安装

    Qt是跨平台的图形开发库,目前由Digia全资子公司 Qt Company 独立运营,官方网址: http://www.qt.io/ 也可以访问Qt项目域名:http://qt-project.org ...

  4. shell 正则表达式与文件名匹配

    1) . : 匹配任意单ASCII 字符,可以为字母,或为数字. 2) 举例: ..XC..匹配deXC1t.23XCdf 等,.w..w..w.匹配rwxrw-rw- 行首以^匹配字符串或字符序列 ...

  5. 1122 Hamiltonian Cycle

    题意:包含图中所有结点的简单环称为汉密尔顿环.给出无向图,然后给出k个查询,问每个查询是否是汉密尔顿环. 思路:根据题目可知,我们需要判断一下几个条件:(1).首先保证给定的环相邻两结点是连通的:(2 ...

  6. 记一次oralce 11g r2 rac安装问题

    环境:虚拟机[root@rac1 ~]# lsb_release -aLSB Version:        :base-4.0-amd64:base-4.0-noarch:core-4.0-amd6 ...

  7. KMP算法详解(转)

    转自http://www.matrix67.com/blog/archives/115 通常我们的方法是枚举从A串的什么位置起开始与B匹配,然后验证是否匹配.假如A串长度为n,B串长度为m,那么这种方 ...

  8. Django的views使用

    这里介绍一下Django中常用的类视图,主要说明在视图中如何接收和传递参数.返回到页面等. 注意,使用这些类视图时,在url中需要加上.as_view(). 我将介绍的内容分为三部分:django的V ...

  9. 关于Android使用Instrumentation做功能测试的时候遇到的一个问题

    最近在看测试方面的东西,看到官网上有一个使用Instrumentation做功能测试的例子,自己敲了敲,但是在自己的手机上就是测不过. 经过调试,我发现是我手机上的输入法把输入事件拦截了,需要多输入一 ...

  10. 07-Location之正则匹配

    大网站专门有自己的图片服务器,起码也得单独放一个目录里面. 淘宝网有些图片开启了防盗链(即使是小图片,也不让你下载,真小气).163新闻可以下载. 用正则匹配uri中的image,就是说你的uri中到 ...