Dropout & Maxout

This is the 8th post of a series of posts I planned about a journal of myself studying deep learning in Professor Bhiksha Raj's course, deep learning lab. I decided to write these posts as notes of my learning process and I hope these posts can help the others with similar background. 
Back to Content Page
--------------------------------------------------------------------
PDF Version Available Here
--------------------------------------------------------------------
In the last post when we looked at the techniques for convolutional neural networks, we have mentioned dropout as a technique to control sparsity. Here let's look at the details of it and let's look at another similar technique called maxout. Again, these techniques are not constrained only to convolutional neural networks, but can be applied to almost any deep networks, or at least feedforward deep networks.

Dropout

Dropout is famous, and powerful, and simple. Despite the fact that dropout is widely used and very powerful, the idea is actually simple: randomly dropping out some of the units while training. One case can be showed as in the following figure.

Figure 1. An illustration of the idea of dropout

To state this a little more formally: one each training case, each hidden unit is randomly omitted from the network with a probability of p. One thing to notice though, the selected dropout units are different for each training instance, that's why this is more of a training problem, rather than an architecture problem.
As stated in the origin paper by Hilton et al, another view to look at dropout makes this solution interesting. Dropout can be seen as an efficient way to perform model averaging across a large number of different neural networks, where overfitting can be avoided with much less cost of computation.
Initially in the paper, dropout is discussed under p=0.5, but of course it could basically set up to any probability.  

Maxout

Maxout is an idea derived for dropout. It is simply an activation function that takes the max of input, but when it works with dropout, it can reinforce the properties dropout has: improve the accuracy of fast model averaging technique and facilitate optimization. 
Different from max-pooling, maxout is based on a whole hidden layer that is built upon the layer we are interested in, so it's more like a layerwise activation function. As stated by the original paper from Ian, with these hidden layers that only consider the max of input, the network remains the same power of universal approximation. The reasoning is not very different from what we did in the 3rd post of this series on universal approximation power.  
 
Despite of the fact that maxout is an idea that works derived on dropout and works better, maxout can only be implemented on feedforward neural networks like multi-layer perceptron or convolutional neural networks. In contrast, dropout is a fundamental idea, though simple, that can work for basically any networks. Dropout is more like the idea of bagging, both in the sense of bagging's ability to increase accuracy by model averaging, and in the sense of bagging's widely adaption that can be integrated with almost any machine learning algorithm. 
 
In this post we have talked about two simple and powerful ideas that can help to increase the accuracy with model averaging technique. In the next post, let's move back to the track of network architectures and start to talk generative models' network architecture. 
----------------------------------------------
If you find this helpful, please cite:
Wang, Haohan, and Bhiksha Raj. "A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas." arXiv preprint arXiv:1510.04781 (2015).
----------------------------------------------

By Haohan Wang
Note: I am still a student learning everything, there may be mistakes due to my limited knowledge. Please feel free to tell me wherever you find incorrect or uncomfortable with. Thank you.

Main Reference:

  1. Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).
  2. Goodfellow, Ian J., et al. "Maxout networks." arXiv preprint arXiv:1302.4389 (2013).

Dropout & Maxout的更多相关文章

  1. Deep learning:四十五(maxout简单理解)

    maxout出现在ICML2013上,作者Goodfellow将maxout和dropout结合后,号称在MNIST, CIFAR-10, CIFAR-100, SVHN这4个数据上都取得了start ...

  2. [转]理解dropout

    理解dropout 原文地址:http://blog.csdn.net/stdcoutzyx/article/details/49022443     理解dropout 注意:图片都在github上 ...

  3. 激活函数(ReLU, Swish, Maxout)

    神经网络中使用激活函数来加入非线性因素,提高模型的表达能力. ReLU(Rectified Linear Unit,修正线性单元) 形式如下: \[ \begin{equation} f(x)= \b ...

  4. 【机器学习】激活函数(ReLU, Swish, Maxout)

    https://blog.csdn.net/ChenVast/article/details/81382939 神经网络中使用激活函数来加入非线性因素,提高模型的表达能力. ReLU(Rectifie ...

  5. 激活函数--(Sigmoid,tanh,Relu,maxout)

    Question? 激活函数是什么? 激活函数有什么用? 激活函数怎么用? 激活函数有哪几种?各自特点及其使用场景? 1.激活函数 1.1激活函数是什么? 激活函数的主要作用是提供网络的非线性建模能力 ...

  6. 深度学习方法(十):卷积神经网络结构变化——Maxout Networks,Network In Network,Global Average Pooling

    欢迎转载,转载请注明:本文出自Bin的专栏blog.csdn.net/xbinworld. 技术交流QQ群:433250724,欢迎对算法.技术感兴趣的同学加入. 最近接下来几篇博文会回到神经网络结构 ...

  7. 理解dropout

    理解dropout 注意:图片都在github上放着,如果刷不开的话,可以考虑FQ. 转载请注明:http://blog.csdn.net/stdcoutzyx/article/details/490 ...

  8. 激活函数,Batch Normalization和Dropout

    神经网络中还有一些激活函数,池化函数,正则化和归一化函数等.需要详细看看,啃一啃吧.. 1. 激活函数 1.1 激活函数作用 在生物的神经传导中,神经元接受多个神经的输入电位,当电位超过一定值时,该神 ...

  9. 在RNN中使用Dropout

    dropout在前向神经网络中效果很好,但是不能直接用于RNN,因为RNN中的循环会放大噪声,扰乱它自己的学习.那么如何让它适用于RNN,就是只将它应用于一些特定的RNN连接上.   LSTM的长期记 ...

随机推荐

  1. 洛谷P2025 脑力大人之监听电话

    题目描述 话说埃菲尔铁塔小区的房子只有一栋,且只有一层,其中每一家都装有一个监听器,具体地,如果编号为第i家的人给编号第\(j\)家的人打了电话,\(i \leq j\),当然,也会有些人无聊地自己给 ...

  2. JMeter - REST API测试 - 完整的数据驱动方法(翻译)

    https://github.com/vinsguru/jmeter-rest-data-drivern/tree/master 在本文中,我想向您展示一种用于REST API测试的数据驱动方法.如果 ...

  3. Sublime Text 3 多行游标

    选中要修改的地方ctrl+D ,要跳过不需要修改的选中的就用ctrl+k+d 选中要修改的地方ctrl+D,选中所有要修改的 alt+f3 ctrl+A  ,然后ctrl+shift+L 按住shif ...

  4. poj1082 Calendar Game (博弈)

    题意是:Adam和Eve两人做游戏,开始给出一个日期,截止日期是2011.11.4,游戏规则如下: 每个人只能将天数增加一天或者将月份增加一天.如果下个月没有这一天,那么只能增加天数. 游戏胜利定义为 ...

  5. (转)centos7.4 fdisk磁盘分区 格式化 挂载

    centos7.4 fdisk磁盘分区 格式化 挂载 原文:http://blog.csdn.net/capecape/article/details/78499351 1.查看系统中有多少可以识别的 ...

  6. (转)不看绝对后悔的Linux三剑客之grep实战精讲

    不看绝对后悔的Linux三剑客之grep实战精讲 原文:http://blog.51cto.com/hujiangtao/1923675 https://www.cnblogs.com/peida/a ...

  7. C# 用正则表达式判断字符串是否为纯数字

    Regex regex = new System.Text.RegularExpressions.Regex("^(-?[0-9]*[.]*[0-9]{0,3})$"); stri ...

  8. <rhel6 mysql replication>

    MySQL 支持单向.异步复制,复制过程中一个服务器充当主服务器,而一个或多个其它服务器充当从服务器.主服务器将更新写入二进制日志文件,并维护文件的一个索引以跟踪日志循环.这些日志可以记录发送到从服务 ...

  9. H5移动端原生长按事件

    // 函数名longpress// 参数为: 需长按元素的id.长按之后处理函数func function longPress(id, func,timeout=500) { var timeOutE ...

  10. iOS开发ReactiveCocoa学习笔记(一)

    学习 RAC 我们首先要了解 RAC 都有哪些类 RACSignal RACSubject RACSequence RACMulticastConnection RACCommand 在学习的时候写了 ...