NEGOUT: SUBSTITUTE FOR MAXOUT UNITS

Maxout [1] units are well-known and frequently used tools for Deep Neural Networks. For whom does not know, with a basic explanation, a Maxout unit is a set of internal activation units competing with each other for each instance and activation of the winner is propagated as output and the loosers are kept silent. At the backpropagation phase, it means we update only the winner unit. That also means, implicitly, we always prefer to back-propagate gradient signal through the strongest path.  It is an important aspect of Maxout units, especially for very deep models which are prone to gradient instability.

Although Maxout units have very good properties like which I told (please refer to the paper for more details), I am a proactive sceptic of its ability to encode underlying information and pass it to next layer.  Here is a very simple example. Suppose we have two competing functions (filters) in a Maxout unit. One of these functions is receptive of edge structures whereas the other is receptive of corners. For an instance, we might have the first filter as the winner with a value, let’s say, ~3 which means Maxout output is also ~3. For another instance, we have the other function as the winner with approximately same value ~3. If we assume that each NN layer is a classifier which takes the previous layer output as a feature vector (I guess not very wrong assumption), then basically we give the same value for different detections for a particular feature dimension (which is corresponded to our Maxout unit). Eventually, we cannot expect from the next layer to be able to discern this signal.

Edge detector is the winner

Corner detector is the winner but the result is same

One can argue that we should evaluate Maxout unit as a whole and it is reminiscent of OR function on top of multiple filters. This is a valid argument which I cannot refuse directly but the problem that I indicated above is still floating on air.  Beside,  why we would waste our expensive NN parameters, if we could come up with a better encoding scheme for Maxout units

Here is one alternative approach for better encoding of competing functions, which we call NegOut. Let's assume we have a ordering of two competing functions by heart as 1st and 2nd. If the winner is the 1st function, NegOut outputs the 1st function's value and otherwise it outputs the 2nd function but by taking its negative. NegOut yields two assumptions. The first, competing functions are always positive (like ReLU functions ). The second, we have 2 competing functions.

NegOut activation with different winners.

If we consider the backpropagation signal, the only difference from Maxout unit is to take negative of the gradient signal for the 2nd competing unit, if it is the winner.

As you can see from the figure, the inherent property here is to output different values for different winner detectors in which the value captures both the structural difference and the strength of the winner activation.

I performed some experiments on CIFAR-10 and MNIST comparing Maxout Network with NegOut Network with exact same architectures explained in the Maxout Paper [1].  The table below summarizes results that I observe by the initial runs without any finetunning or hyper-parameter optimization yet. More comparisons on larger datasets are still in progress.

Results on CIFAR-10 and MNIST after average of 5 different runs.

NegOut give better results on CIFAR, although it is slightly lower on MNIST. Again notice that no tunning has been took a place for our NegOut network where as Maout Network is optimized as described in the paper [1].  In addition, NegOut network uses 2 competing set of units (as it is constrained by its nature) for the last FC layer in comparison to Maxout net which uses 5 competing units. My expectation is to have more difference as we go through larger models and datasets since as we scale up, representational power takes more place for better results.

Here, I tried to give a basic sketch of my recent work by no means complete. Different observations and experiments are still running. I also need to include LWTA [2] for being more fair and grasp more wider aspect of competing units. Please feel free to share your thoughts as well. Any contribution is appreciated.

PS: Lately, I devote myself to analyze the internal dynamics of Neural Networks with different architectures, layers and activation functions. The aim is checking under the hood and analyzing any intuitionally well-functioning ideas applied to  Deep Neural Networks. I also expect to share more of my findings at my blog.

[1] Maxout networks IJ Goodfellow, D Warde-Farley, M Mirza, A Courville, Y Bengio arXiv preprint arXiv:1302.4389

[2] Understanding Locally Competitive Networks Rupesh Kumar Srivastava, Jonathan Masci, Faustino Gomez, Jürgen Schmidhuber. http://arxiv.org/abs/1410.1165

NEGOUT: SUBSTITUTE FOR MAXOUT UNITS的更多相关文章

  1. Deep Learning in a Nutshell: Core Concepts

    Deep Learning in a Nutshell: Core Concepts This post is the first in a series I’ll be writing for Pa ...

  2. (转) Deep Learning in a Nutshell: Core Concepts

    Deep Learning in a Nutshell: Core Concepts Share:   Posted on November 3, 2015by Tim Dettmers 7 Comm ...

  3. Classifying plankton with deep neural networks

    Classifying plankton with deep neural networks The National Data Science Bowl, a data science compet ...

  4. 条件GAN论文简单解读

        条件GAN(Conditional Generative Adversarial Nets),原文地址为CGAN. Abstract     生成对抗网络(GAN)是最近提出的训练生成模型(g ...

  5. 用500行Julia代码开始深度学习之旅 Beginning deep learning with 500 lines of Julia

    Click here for a newer version (Knet7) of this tutorial. The code used in this version (KUnet) has b ...

  6. ICLR 2014 International Conference on Learning Representations深度学习论文papers

    ICLR 2014 International Conference on Learning Representations Apr 14 - 16, 2014, Banff, Canada Work ...

  7. 激活函数--(Sigmoid,tanh,Relu,maxout)

    Question? 激活函数是什么? 激活函数有什么用? 激活函数怎么用? 激活函数有哪几种?各自特点及其使用场景? 1.激活函数 1.1激活函数是什么? 激活函数的主要作用是提供网络的非线性建模能力 ...

  8. Dropout & Maxout

    [ML] My Journal from Neural Network to Deep Learning: A Brief Introduction to Deep Learning. Part. E ...

  9. Excel函数——DATE、SUBSTITUTE、REPLACE、ISERROR、IFERROR

    1.DATE DATE 函数返回表示特定日期的连续序列号.例如,公式 =DATE(2008,7,8) 返回 2008-7-8或39637,取决于单元格格式,但空单元格计算和默认为日期格式. DATE也 ...

随机推荐

  1. 3.3V电源LDO

    1:今天用到1颗3.3v的LDO,如图 输入输出都是3.3V,但是一个是做模拟电压,以后遇到这种情况可以这样使用. 2:二极管降压电路,1.8V转1.5V

  2. 自己动手写Impala UDF

    本文由  网易云发布. 概述 出于对可扩展性和性能的考虑,UDF已变成大数据生态圈查询引擎的必备功能之一,无论是Calcite.Hive.Impala都对其进行支持,但是UDF的支持有利也有弊,好处在 ...

  3. 每日scrum(6)

    今天是小组正式冲刺的第六天,软件的各种结尾工作,还有一些模块就已经全部实现了: 遇到的问题主要是对于自己能力的担忧,以前总是想,如果自己努力,就会怎样成功,其实并不是那样,小小的距离就是很远的能力差距 ...

  4. Linux基础二(挂载、关机重启与系统等级)

    一.Linux 基础之挂载 1. 挂载和查询 1.1 挂载 什么叫挂载?装系统的时候要给硬盘分区,在 Windows 中要分 C 盘 D 盘 DEF 盘,这个操作我们叫做分配盘符,分配盘符之后我们就可 ...

  5. CentOS 安装 Harbor的简单过程(仅使用http 未使用https)

    1. 下载离线安装包 在线安装 99% 会失败, 建议还是使用离线安装包 下载地址 https://github.com/vmware/harbor/releases 20180719 时最新版本的g ...

  6. java学习三 小数默认为double

    前++,后++在独立运算时候 直接计算出值 当后加加和减减和其他代码在一行的时候先使用加加和减减之前的值, 如果不在同一行,后面的一行就会得到加加或减减后的值 &&是逻辑运算符,逻辑运 ...

  7. js中的async await

    JavaScript 中的 async/await 是属于比较新的知识,在ES7中被提案在列,然而我们强大的babel粑粑已经对它进行列支持! 如果开发中使用了babel转码,那么就放心大胆的用吧. ...

  8. Java_按位与&,按位或,取反,左移,右移运算符

    //按位与运算& System.out.println(0&0);//0 System.out.println(0&1);//0 System.out.println(1&am ...

  9. 牛客网暑期ACM多校训练营(第一场)J Different Integers

    链接:https://www.nowcoder.com/acm/contest/139/J 题意: 给你[l,r]问[1,l],[r,n]中有多少个不同的数. 思路: 可以参考上一篇博客:https: ...

  10. idea log4j 用法

    1.导入jar包 这里用的maven导入 <!-- LOGGING begin --> <dependency> <groupId>org.slf4j</gr ...