Dropout & Maxout

This is the 8th post of a series of posts I planned about a journal of myself studying deep learning in Professor Bhiksha Raj's course, deep learning lab. I decided to write these posts as notes of my learning process and I hope these posts can help the others with similar background. 
Back to Content Page
--------------------------------------------------------------------
PDF Version Available Here
--------------------------------------------------------------------
In the last post when we looked at the techniques for convolutional neural networks, we have mentioned dropout as a technique to control sparsity. Here let's look at the details of it and let's look at another similar technique called maxout. Again, these techniques are not constrained only to convolutional neural networks, but can be applied to almost any deep networks, or at least feedforward deep networks.

Dropout

Dropout is famous, and powerful, and simple. Despite the fact that dropout is widely used and very powerful, the idea is actually simple: randomly dropping out some of the units while training. One case can be showed as in the following figure.

Figure 1. An illustration of the idea of dropout

To state this a little more formally: one each training case, each hidden unit is randomly omitted from the network with a probability of p. One thing to notice though, the selected dropout units are different for each training instance, that's why this is more of a training problem, rather than an architecture problem.
As stated in the origin paper by Hilton et al, another view to look at dropout makes this solution interesting. Dropout can be seen as an efficient way to perform model averaging across a large number of different neural networks, where overfitting can be avoided with much less cost of computation.
Initially in the paper, dropout is discussed under p=0.5, but of course it could basically set up to any probability.  

Maxout

Maxout is an idea derived for dropout. It is simply an activation function that takes the max of input, but when it works with dropout, it can reinforce the properties dropout has: improve the accuracy of fast model averaging technique and facilitate optimization. 
Different from max-pooling, maxout is based on a whole hidden layer that is built upon the layer we are interested in, so it's more like a layerwise activation function. As stated by the original paper from Ian, with these hidden layers that only consider the max of input, the network remains the same power of universal approximation. The reasoning is not very different from what we did in the 3rd post of this series on universal approximation power.  
 
Despite of the fact that maxout is an idea that works derived on dropout and works better, maxout can only be implemented on feedforward neural networks like multi-layer perceptron or convolutional neural networks. In contrast, dropout is a fundamental idea, though simple, that can work for basically any networks. Dropout is more like the idea of bagging, both in the sense of bagging's ability to increase accuracy by model averaging, and in the sense of bagging's widely adaption that can be integrated with almost any machine learning algorithm. 
 
In this post we have talked about two simple and powerful ideas that can help to increase the accuracy with model averaging technique. In the next post, let's move back to the track of network architectures and start to talk generative models' network architecture. 
----------------------------------------------
If you find this helpful, please cite:
Wang, Haohan, and Bhiksha Raj. "A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas." arXiv preprint arXiv:1510.04781 (2015).
----------------------------------------------

By Haohan Wang
Note: I am still a student learning everything, there may be mistakes due to my limited knowledge. Please feel free to tell me wherever you find incorrect or uncomfortable with. Thank you.

Main Reference:

  1. Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).
  2. Goodfellow, Ian J., et al. "Maxout networks." arXiv preprint arXiv:1302.4389 (2013).

Dropout & Maxout的更多相关文章

  1. Deep learning:四十五(maxout简单理解)

    maxout出现在ICML2013上,作者Goodfellow将maxout和dropout结合后,号称在MNIST, CIFAR-10, CIFAR-100, SVHN这4个数据上都取得了start ...

  2. [转]理解dropout

    理解dropout 原文地址:http://blog.csdn.net/stdcoutzyx/article/details/49022443     理解dropout 注意:图片都在github上 ...

  3. 激活函数(ReLU, Swish, Maxout)

    神经网络中使用激活函数来加入非线性因素,提高模型的表达能力. ReLU(Rectified Linear Unit,修正线性单元) 形式如下: \[ \begin{equation} f(x)= \b ...

  4. 【机器学习】激活函数(ReLU, Swish, Maxout)

    https://blog.csdn.net/ChenVast/article/details/81382939 神经网络中使用激活函数来加入非线性因素,提高模型的表达能力. ReLU(Rectifie ...

  5. 激活函数--(Sigmoid,tanh,Relu,maxout)

    Question? 激活函数是什么? 激活函数有什么用? 激活函数怎么用? 激活函数有哪几种?各自特点及其使用场景? 1.激活函数 1.1激活函数是什么? 激活函数的主要作用是提供网络的非线性建模能力 ...

  6. 深度学习方法(十):卷积神经网络结构变化——Maxout Networks,Network In Network,Global Average Pooling

    欢迎转载,转载请注明:本文出自Bin的专栏blog.csdn.net/xbinworld. 技术交流QQ群:433250724,欢迎对算法.技术感兴趣的同学加入. 最近接下来几篇博文会回到神经网络结构 ...

  7. 理解dropout

    理解dropout 注意:图片都在github上放着,如果刷不开的话,可以考虑FQ. 转载请注明:http://blog.csdn.net/stdcoutzyx/article/details/490 ...

  8. 激活函数,Batch Normalization和Dropout

    神经网络中还有一些激活函数,池化函数,正则化和归一化函数等.需要详细看看,啃一啃吧.. 1. 激活函数 1.1 激活函数作用 在生物的神经传导中,神经元接受多个神经的输入电位,当电位超过一定值时,该神 ...

  9. 在RNN中使用Dropout

    dropout在前向神经网络中效果很好,但是不能直接用于RNN,因为RNN中的循环会放大噪声,扰乱它自己的学习.那么如何让它适用于RNN,就是只将它应用于一些特定的RNN连接上.   LSTM的长期记 ...

随机推荐

  1. Spring【基础】-注解-转载

    站在巨人的肩膀上,感谢! https://blog.csdn.net/chjttony/ 1.在java开发领域,Spring相对于EJB来说是一种轻量级的,非侵入性的Java开发框架, 曾经有两本很 ...

  2. CF987C Three displays 暴力

    题意翻译 题目大意: nnn个位置,每个位置有两个属性s,cs,cs,c,要求选择3个位置i,j,ki,j,ki,j,k,使得si<sj<sks_i<s_j<s_ksi​< ...

  3. [Cqoi2014]危桥 (两遍网络流)

    题目链接 #include <bits/stdc++.h> using namespace std; typedef long long ll; inline int read() { , ...

  4. giihub上的关于js的43道题目

    参考 https://github.com/lydiahallie/javascript-questions

  5. CentOS7.3下Zabbix3.5之邮件报警配置

    一.邮件客户端以及脚本相关配置 1.安装sendmail,一般操作系统默认安装了安装 yum install sendmail 启动 service sendmail start 设置开机启动 chk ...

  6. 3、kvm配置vnc

    配置kvm通过vnc访问 virsh edit privi-server 添加如下配置: <graphics type='vnc' port='5901' autoport='no' liste ...

  7. CompareToBuilder构建Comparator

    import org.apache.commons.lang.builder.CompareToBuilder; Collections.sort(outboundNotices, new Compa ...

  8. 【ElasticSearch+NetCore 第一篇】在Windows上安装部署ElasticSearch和ElasticSearch-head

    ElasticSearch是一个基于Lucene的搜索服务器.它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口.Elasticsearch是用Java开发的,并作为Apach ...

  9. 034 Search for a Range 搜索范围

    给定一个已经升序排序的整形数组,找出给定目标值的开始位置和结束位置.你的算法时间复杂度必须是 O(log n) 级别.如果在数组中找不到目标,返回 [-1, -1].例如:给出 [5, 7, 7, 8 ...

  10. Unity Gizmos可视化辅助工具

    所有gizmo绘制需要在脚本的OnDrawGizmos或OnDrawGizmosSelected里函数完成. OnDrawGizmos在每帧调用.所有在OnDrawGizmos中渲染的gizmos都是 ...