Training Deep Neural Networks
http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于
Training Deep Neural Networks
Tutorials
Popular Training Approaches of DNNs — A Quick Overview
Activation functions
Rectified linear units improve restricted boltzmann machines (ReLU)
Rectifier Nonlinearities Improve Neural Network Acoustic Models (leaky-ReLU, aka LReLU)
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (PReLU)
- keywords: PReLU, Caffe “msra” weights initilization
- arXiv: http://arxiv.org/abs/1502.01852
Empirical Evaluation of Rectified Activations in Convolutional Network (ReLU/LReLU/PReLU/RReLU)
Deep Learning with S-shaped Rectified Linear Activation Units (SReLU)
Parametric Activation Pools greatly increase performance and consistency in ConvNets
Noisy Activation Functions
Weights Initialization
An Explanation of Xavier Initialization
Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?
All you need is a good init
Data-dependent Initializations of Convolutional Neural Networks
What are good initial weights in a neural network?
- stackexchange: http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
RandomOut: Using a convolutional gradient norm to win The Filter Lottery
Batch Normalization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift(ImageNet top-5 error: 4.82%)
- arXiv: http://arxiv.org/abs/1502.03167
- blog: https://standardfrancis.wordpress.com/2015/04/16/batch-normalization/
- notes: http://blog.csdn.net/happynear/article/details/44238541
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
- arxiv: http://arxiv.org/abs/1602.07868
- github(Lasagne): https://github.com/TimSalimans/weight_norm
- notes: http://www.erogol.com/my-notes-weight-normalization/
Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
Loss Function
The Loss Surfaces of Multilayer Networks
Optimization Methods
On Optimization Methods for Deep Learning
On the importance of initialization and momentum in deep learning
Invariant backpropagation: how to train a transformation-invariant neural network
A practical theory for designing very deep convolutional neural network
- kaggle: https://www.kaggle.com/c/datasciencebowl/forums/t/13166/happy-lantern-festival-report-and-code/69284
- paper: https://kaggle2.blob.core.windows.net/forum-message-attachments/69182/2287/A%20practical%20theory%20for%20designing%20very%20deep%20convolutional%20neural%20networks.pdf?sv=2012-02-12&se=2015-12-05T15%3A40%3A02Z&sr=b&sp=r&sig=kfBQKduA1pDtu837Y9Iqyrp2VYItTV0HCgOeOok9E3E%3D
- slides: http://vdisk.weibo.com/s/3nFsznjLKn
Stochastic Optimization Techniques
- intro: SGD/Momentum/NAG/Adagrad/RMSProp/Adadelta/Adam/ESGD/Adasecant/vSGD/Rprop
- blog: http://colinraffel.com/wiki/stochastic_optimization_techniques
Alec Radford’s animations for optimization algorithms
http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html
Faster Asynchronous SGD (FASGD)
An overview of gradient descent optimization algorithms (★★★★★)

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
Writing fast asynchronous SGD/AdaGrad with RcppParallel
Regularization
DisturbLabel: Regularizing CNN on the Loss Layer [University of California & MSR] (2016)
- intro: “an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration”
- paper: http://research.microsoft.com/en-us/um/people/jingdw/pubs/cvpr16-disturblabel.pdf
Dropout
Improving neural networks by preventing co-adaptation of feature detectors (Dropout)
Regularization of Neural Networks using DropConnect
- homepage: http://cs.nyu.edu/~wanli/dropc/
- gitxiv: http://gitxiv.com/posts/rJucpiQiDhQ7HkZoX/regularization-of-neural-networks-using-dropconnect
- github: https://github.com/iassael/torch-dropconnect
Regularizing neural networks with dropout and with DropConnect
Fast dropout training
- paper: http://jmlr.org/proceedings/papers/v28/wang13a.pdf
- github: https://github.com/sidaw/fastdropout
Dropout as data augmentation
- paper: http://arxiv.org/abs/1506.08700
- notes: https://www.evernote.com/shard/s189/sh/ef0c3302-21a4-40d7-b8b4-1c65b8ebb1c9/24ff553fcfb70a27d61ff003df75b5a9
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Improved Dropout for Shallow and Deep Learning
Gradient Descent
Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?(Normal Equations vs. GD vs. SGD vs. MB-GD)
http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html
An Introduction to Gradient Descent in Python
Train faster, generalize better: Stability of stochastic gradient descent
A Variational Analysis of Stochastic Gradient Algorithms
The vanishing gradient problem: Oh no — an obstacle to deep learning!
Gradient Descent For Machine Learning
http://machinelearningmastery.com/gradient-descent-for-machine-learning/
Revisiting Distributed Synchronous SGD
Accelerate Training
Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices
Image Data Augmentation
DataAugmentation ver1.0: Image data augmentation tool for training of image recognition algorithm
Caffe-Data-Augmentation: a branc caffe with feature of Data Augmentation using a configurable stochastic combination of 7 data augmentation techniques
Papers
Scalable and Sustainable Deep Learning via Randomized Hashing
Tools
pastalog: Simple, realtime visualization of neural network training performance

torch-pastalog: A Torch interface for pastalog - simple, realtime visualization of neural network training performance
Training Deep Neural Networks的更多相关文章
- Training (deep) Neural Networks Part: 1
Training (deep) Neural Networks Part: 1 Nowadays training deep learning models have become extremely ...
- CVPR 2018paper: DeepDefense: Training Deep Neural Networks with Improved Robustness第一讲
前言:好久不见了,最近一直瞎忙活,博客好久都没有更新了,表示道歉.希望大家在新的一年中工作顺利,学业进步,共勉! 今天我们介绍深度神经网络的缺点:无论模型有多深,无论是卷积还是RNN,都有的问题:以图 ...
- 论文翻译:BinaryConnect: Training Deep Neural Networks with binary weights during propagations
目录 摘要 1.引言 2.BinaryConnect 2.1 +1 or -1 2.2确定性与随机性二值化 2.3 Propagations vs updates 2.4 Clipping 2.5 A ...
- 论文翻译:BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1
目录 摘要 引言 1.BinaryNet 符号函数 梯度计算和累积 通过离散化传播梯度 一些有用的成分 算法1 使用BinaryNet训练DNN 算法2 批量标准化转换(Ioffe和Szegedy,2 ...
- 为什么深度神经网络难以训练Why are deep neural networks hard to train?
Imagine you're an engineer who has been asked to design a computer from scratch. One day you're work ...
- This instability is a fundamental problem for gradient-based learning in deep neural networks. vanishing exploding gradient problem
The unstable gradient problem: The fundamental problem here isn't so much the vanishing gradient pro ...
- [C4] Andrew Ng - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
About this Course This course will teach you the "magic" of getting deep learning to work ...
- [Box] Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
目录 概 主要内容 LSGD Box 初始化 Box for Resnet 代码 Cyr E C, Gulian M, Patel R G, et al. Robust Training and In ...
- On Explainability of Deep Neural Networks
On Explainability of Deep Neural Networks « Learning F# Functional Data Structures and Algorithms is ...
随机推荐
- Atcoder CODE FESTIVAL 2016 qual C 的E题 Encyclopedia of Permutations
题意: 对于一个长度为n的排列P,如果P在所有长度为n的排列中,按照字典序排列后,在第s位,则P的value为s 现在给出一个长度为n的排列P,P有一些位置确定了,另外一些位置为0,表示不确定. 现在 ...
- Java中Enum类型的序列化(转)
在Java中,对Enum类型的序列化与其他对象类型的序列化有所不同,今天就来看看到底有什么不同.下面先来看下在Java中,我们定义的Enum在被编译之后是长成什么样子的. Java代码: Java代码 ...
- python序列化: json & pickle & shelve 模块
一.json & pickle & shelve 模块 json,用于字符串 和 python数据类型间进行转换pickle,用于python特有的类型 和 python的数据类型间进 ...
- 二模01day1解题报告
T1.音量调节(changingsounds) 有n个物品的背包(有点不一样,每个物品必须取),给出初始价值,物品价值可正可负(就是两种选择嘛),求可能的最大价值,不可能(<0或>maxs ...
- Qlikview 的服务器
服务器管理 1 , Create a job 1.1 转到 Documents 分页 1.2 从左边目录搜索到 需要执行Job的qvw报表,如 "getting start.qvw" ...
- PCI Express(三) - A story of packets, stack and network
原文出处:http://www.fpga4fun.com/PCI-Express3.html Packetized transactions PCI express is a serial bus. ...
- 了解 JavaScript 应用程序中的内存泄漏
简介 当处理 JavaScript 这样的脚本语言时,很容易忘记每个对象.类.字符串.数字和方法都需要分配和保留内存.语言和运行时的垃圾回收器隐藏了内存分配和释放的具体细节. 许多功能无需考虑内存管理 ...
- avalon2的后端渲染实践
avalon2为了提高性能,采用全新的架构,四层架构,其中一层为虚拟DOM. 虚拟DOM的一个好处是能大大提高性能,另一个好处是能过错整描述我们的页面结构.因此在非浏览器环境下,虚拟DOM也能正常运行 ...
- linux(debian)下邮件发送
关键字: exim4 mutt smtp 主要的事情就是配置exim4,按照网上的流程来.在这里总结一下: 需要修改的文件有三个:/etc/exim4/update-exim4.conf.conf ...
- Java值传递和引用传递详细解说
前天在做系统的时候被Java中参数传递问题卡了一下,回头查阅了相关的资料,对参数传递问题有了新的了解和掌握,但是有个问题感觉还是很模糊,就是 Java中到底是否只存在值传递,因为在查阅资料时,经常看到 ...