Rupesh Kumar Srivastava
Klaus Greff
 ̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

  1. 【论文笔记】Training Very Deep Networks - Highway Networks

    目标: 怎么训练很深的神经网络 然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练 作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...

  2. Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)

    前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...

  3. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  4. 论文笔记:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...

  5. 【DeepLearning】Exercise: Implement deep networks for digit classification

    Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...

  6. 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks

    In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...

  7. Deep Networks : Overview

    Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...

  8. Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

    1,概述 模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...

  9. Communication-Efficient Learning of Deep Networks from Decentralized Data

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

  1. flask之redis

    redis 连接需要host port passwod Hash:key-fields-value(做缓存)相当于一个key对于一个map,map中还有key-valueList:有顺序可重复(处理不 ...

  2. linux 查看系统信息和安装哪些软件的命令

    https://www.cnblogs.com/wangkongming/p/4531341.html 查看系统磁盘硬盘占用率 https://blog.csdn.net/aaashen/articl ...

  3. grep 命令使用指南

    grep 命令 grep参数: -E:等同于egrep -o:只匹配你想要的内容,下面是示例: [root@localhost ~]# cat /data/game/config/server_con ...

  4. fragment用法

    简单用法: 1.新建布局.新建fragment类 2.在activity_main.xml中添加fragment <LinearLayout...... <fragment android ...

  5. SQLSERVER 2008 查询数据字段名类型

    SELECT * FROM Master..SysDatabases ORDER BY Name SELECT Name,* FROM Master..SysDatabases where Name= ...

  6. oracle里的查询转换

    oracle里的查询转换的作用 Oracle里的查询转换,有称为查询改写,指oracle在执行目标sql时可能会做等价改写,目的是为了更高效的执行目标sql 在10g及其以后的版本中,oracle会对 ...

  7. nat123安装启动教程帮助

    转自:http://www.nat123.com/Pages_17_291.jsp 本文就nat123安装启动可能遇到的问题及与安全狗影响处理. 下载安装nat123客户端安装包.第一次安装使用,可选 ...

  8. IE回车的一个怪异行为

    IE中在input中回车相当于提交form,会从dom中找最近的button标签触发click事件 <!DOCTYPE html> <html> <head> &l ...

  9. Redis搭建(五):Cluster集群搭建

    一.方案 1. 介绍 redis3.0及以上版本实现,集群中至少应该有奇数个节点,所以至少有三个节点,官方推荐三主三从的配置方式 使用哈希槽的概念,Redis 集群有16384个哈希槽,每个key通过 ...

  10. 值得一做》关于并查集的进化题目 BZOJ1015(BZOJ第一页计划)(normal-)

    这道题和以前做过的一道经典的洪水冲桥问题很像,主要做法是逆向思维.(BZOJ第10道非SB题纪念) 先给出题目 Description 很久以前,在一个遥远的星系,一个黑暗的帝国靠着它的超级武器统治者 ...