awesome-very-deep-learning

awesome-very-deep-learning is a curated list for papers and code about implementing and training very deep neural networks.

Deep Residual Learning

Deep Residual Networks are a family of extremely deep architectures (up to 1000 layers) showing compelling accuracy and nice convergence behaviors. Instead of learning a new representation at each layer, deep residual networks use identity mappings to learn residuals.

Papers

Wide Residual Networks (2016) [orginal code], studies wide residual neural networks and shows that making residual blocks wider outperforms deeper and thinner network architectures
Swapout: Learning an ensemble of deep architectures (2016), improving accuracy by randomly applying dropout, skipforward and residual units per layer
Deep Networks with Stochastic Depth (2016) [original code], dropout with residual layers as regularizer
Identity Mappings in Deep Residual Networks (2016) [original code], improving the original proposed residual units by reordering batchnorm and activation layers
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning (2016), inception network with residual connections
Deep Residual Learning for Image Recognition (2015) [original code], original paper introducing residual neural networks

Implementations

Torch by Facebook AI Research (FAIR), with training code in Torch and pre-trained ResNet-18/34/50/101 models for ImageNet: blog, code
Torch, CIFAR-10, with ResNet-20 to ResNet-110, training code, and curves: code
Lasagne, CIFAR-10, with ResNet-32 and ResNet-56 and training code: code
Neon, CIFAR-10, with pre-trained ResNet-32 to ResNet-110 models, training code, and curves: code
Neon, Preactivation layer implementation: code
Torch, MNIST, 100 layers: blog, code
A winning entry in Kaggle's right whale recognition challenge: blog, code
Neon, Place2 (mini), 40 layers: blog, code
Tensorflow with tflearn, with CIFAR-10 and MNIST: code
Tensorflow with skflow, with MNIST: code
Stochastic dropout in Keras: code
ResNet in Chainer: code
Stochastic dropout in Chainer: code
Wide Residual Networks in Keras: code
ResNet in TensorFlow 0.9+ with pretrained caffe weights: code

In addition, this code by Ryan Dahl helps to convert the pre-trained models to TensorFlow.

Highway Networks

Highway Networks take inspiration from Long Short Term Memory (LSTM) and allow training of deep, efficient networks (with hundreds of layers) with conventional gradient-based methods

Papers

Training Very Deep Networks (2015), introducing highway neural networks

Implementations

Lasagne: code
Caffe: code
Torch: code
Tensorflow: blog, code

Very Deep Learning Theory

Theories in very deep learning concentrate on the ideas that very deep networks with skip connections are able to efficiently approximate recurrent computations (similar to the recurrent connections in the visual cortex) or are actually exponential ensembles of shallow networks

Papers

Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex, shows that ResNets with shared weights work well too although having fewer parameters
Residual Networks are Exponential Ensembles of Relatively Shallow Networks, shows that ResNets behaves just like ensembles of shallow networks in test time. This suggests that in addition to describing neural networks in terms of width and depth, there is a third dimension: multiplicity, the size of the implicit ensemble