WHAT I READ FOR DEEP-LEARNING
WHAT I READ FOR DEEP-LEARNING
Today, I spent some time on two new papers proposing a new way of training very deep neural networks (Highway-Networks) and a new activation function for Auto-Encoders (ZERO-BIAS AUTOENCODERS AND THE BENEFITS OF
CO-ADAPTING FEATURES) which evades the use of any regularization methods such as Contraction or Denoising.
Lets start with the first one. Highway-Networks proposes a new activation type similar to LTSM networks and they claim that this peculiar activation is robust to any choice of initialization scheme and learning problems occurred at very deep NNs. It is also incentive to see that they trained models with >100 number of layers. The basic intuition here is to learn a gating function attached to a real activation function that decides to pass the activation or the input itself. Here is the formulation


T(x,Wt) is the gating function and H(x,WH) is the real activation. They use Sigmoid activation for gating and Rectifier for the normal activation in the paper. I also implemented it with Lasagne and tried to replicate the results (I aim to release the code later). It is really impressive to see its ability to learn for 50 layers (this is the most I can for my PC).
The other paper ZERO-BIAS AUTOENCODERS AND THE BENEFITS OF CO-ADAPTING FEATURES suggests the use of non-biased rectifier units for the inference of AEs. You can train your model with a biased Rectifier Unit but at the inference time (test time), you should extract features by ignoring bias term. They show that doing so gives better recognition at CIFAR dataset. They also device a new activation function which has the similar intuition to Highway Networks. Again, there is a gating unit which thresholds the normal activation function.


The first equation is the threshold function with a predefined threshold (they use 1 for their experiments). The second equation shows the reconstruction of the proposed model. Pay attention that, in this equation they use square of a linear activation for thresholding and they call this model TLin but they also use normal linear function which is called TRec. What this activation does here is to diminish the small activations so that the model is implicitly regularized without any additional regularizer. This is actually good for learning over-complete representation for the given data.
For more than this silly into, please refer to papers
and warn me for any mistake.
These two papers shows a new coming trend to Deep Learning community which is using complex activation functions . We can call it controlling each unit behavior in a smart way instead of letting them fire naively. My notion also agrees with this idea. I believe even more complication we need for smart units in our deep models like Spike and Slap networks.
WHAT I READ FOR DEEP-LEARNING的更多相关文章
- Deep learning:五十一(CNN的反向求导及练习)
前言: CNN作为DL中最成功的模型之一,有必要对其更进一步研究它.虽然在前面的博文Stacked CNN简单介绍中有大概介绍过CNN的使用,不过那是有个前提的:CNN中的参数必须已提前学习好.而本文 ...
- 【深度学习Deep Learning】资料大全
最近在学深度学习相关的东西,在网上搜集到了一些不错的资料,现在汇总一下: Free Online Books by Yoshua Bengio, Ian Goodfellow and Aaron C ...
- 《Neural Network and Deep Learning》_chapter4
<Neural Network and Deep Learning>_chapter4: A visual proof that neural nets can compute any f ...
- Deep Learning模型之:CNN卷积神经网络(一)深度解析CNN
http://m.blog.csdn.net/blog/wu010555688/24487301 本文整理了网上几位大牛的博客,详细地讲解了CNN的基础结构与核心思想,欢迎交流. [1]Deep le ...
- paper 124:【转载】无监督特征学习——Unsupervised feature learning and deep learning
来源:http://blog.csdn.net/abcjennifer/article/details/7804962 无监督学习近年来很热,先后应用于computer vision, audio c ...
- Deep Learning 26:读论文“Maxout Networks”——ICML 2013
论文Maxout Networks实际上非常简单,只是发现一种新的激活函数(叫maxout)而已,跟relu有点类似,relu使用的max(x,0)是对每个通道的特征图的每一个单元执行的与0比较最大化 ...
- Deep Learning 23:dropout理解_之读论文“Improving neural networks by preventing co-adaptation of feature detectors”
理论知识:Deep learning:四十一(Dropout简单理解).深度学习(二十二)Dropout浅层理解与实现.“Improving neural networks by preventing ...
- Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)
理论知识:Optimization: Stochastic Gradient Descent和Convolutional Neural Network CNN卷积神经网络推导和实现.Deep lear ...
- 0.读书笔记之The major advancements in Deep Learning in 2016
The major advancements in Deep Learning in 2016 地址:https://tryolabs.com/blog/2016/12/06/major-advanc ...
- #Deep Learning回顾#之LeNet、AlexNet、GoogLeNet、VGG、ResNet
CNN的发展史 上一篇回顾讲的是2006年Hinton他们的Science Paper,当时提到,2006年虽然Deep Learning的概念被提出来了,但是学术界的大家还是表示不服.当时有流传的段 ...
随机推荐
- Jmeter(二十二)_脚本上传Gitlab
Docker部署接口自动化持续集成环境第四步,代码上传到远程仓库! 接上文:Ubuntu部署jmeter与ant Gitlab在容器中部署好了之后,本地直接打开.我们可以在里面创建项目,上传脚本. 新 ...
- centos7 lldb 调试netcore应用的内存泄漏和死循环示例(dump文件调试)
写个demo来玩一玩linux平台下使用lldb加载sos来调试netcore应用. 当然,在真实的产线环境中需要分析的数据和难度远远高于demo所示,所以demo的作用也仅仅只能起到介绍工具的作用. ...
- 初入Installshield2015
首先我们来认识一下这款软件:这是一款功能强大的软件打包工具,有着许多强大的功能等着我们去发掘,博主也是最近被这个东西搞得有点晕头, 现在就想让读者朋友们更快的接受这个软件. 这个软件需要的破解工具,大 ...
- 散列(Hash)表入门
一.概述 以 Key-Value 的形式进行数据存取的映射(map)结构 简单理解:用最基本的向量(数组)作为底层物理存储结构,通过适当的散列函数在词条的关键码与向量单元的秩(下标)之间建立映射关系 ...
- SqlHelper DBHelper
根据自己项目的开发需要,整理了一个SqlHelper类 相比较网上通用的SqlHelper类方法主要有一下几点的不同: 1.因为要操作多个数据库,所以数据库连接字符串没有写死到方法里,作为参数提供出来 ...
- 【Alpha】第三次Scrum meeting
今日任务一览: 导航栏诞生 前期准备的Latex文本将撰写完毕 生成燃尽图的问题已经解决 姓名 今日完成任务 所耗时间 刘乾 用Github成功生成了燃尽图(真是不容易啊...),与架构师继续每日面基 ...
- Linux 第五章 学习笔记
---恢复内容开始--- 第五章 系统调用 一.与内核通信 1.系统调用在用户控件进程和硬件设备之间添加了一个中间层. 为用户空间提供了一种硬件的抽象接口 系统调用保证了系统的稳定和安全 每个进程都运 ...
- Alpha冲刺第4天
Alpha第四天 1.团队成员 郑西坤 031602542 (队长) 陈俊杰 031602504 陈顺兴 031602505 张胜男 031602540 廖钰萍 031602323 雷光游 03160 ...
- 2013长春网赛1010 hdu 4768 Flyer
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=4768 题意:有n个社团发传单,每个社团发给编号为A_i, A_i+C_i,A_i+2*C_i,…A_i ...
- POJ 1182 食物链 (带权并查集)
食物链 Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 78551 Accepted: 23406 Description ...