Linear Decoders
Sparse Autoencoder Recap
In the sparse autoencoder, we had 3 layers of neurons: an input layer, a hidden layer and an output layer. In our previous description of autoencoders (and of neural networks), every neuron in the neural network used the same activation function. In these notes, we describe a modified version of the autoencoder in which some of the neurons use a different activation function. This will result in a model that is sometimes simpler to apply, and can also be more robust to variations in the parameters.
Recall that each neuron (in the output layer) computed the following:
where a(3) is the output. In the autoencoder, a(3) is our approximate reconstruction of the input x = a(1).
Because we used a sigmoid activation function for f(z(3)), we needed to constrain or scale the inputs to be in the range[0,1], since the sigmoid function outputs numbers in the range [0,1].
引入 —— 相同的activation function ,非线性映射会导致输入和输出不等
Linear Decoder
One easy fix for this problem is to set a(3) = z(3). Formally, this is achieved by having the output nodes use an activation function that's the identity function f(z) = z, so that a(3) = f(z(3)) = z(3). 输出结点不适用sigmoid 函数
This particular activation function
is called the linear activation function。Note however that in the hidden layer of the network, we still use a sigmoid (or tanh) activation function, so that the hidden unit activations are given by (say)
, where
is the sigmoid function, x is the input, and W(1) andb(1) are the weight and bias terms for the hidden units. It is only in the output layer that we use the linear activation function.
An autoencoder in this configuration--with a sigmoid (or tanh) hidden layer and a linear output layer--is called a linear decoder.
In this model, we have
. Because the output
is a now linear function of the hidden unit activations, by varying W(2), each output unit a(3) can be made to produce values greater than 1 or less than 0 as well. This allows us to train the sparse autoencoder real-valued inputs without needing to pre-scale every example to a specific range.
Since we have changed the activation function of the output units, the gradients of the output units also change. Recall that for each output unit, we had set set the error terms as follows:
where y = x is the desired output,
is the output of our autoencoder, and
is our activation function. Because in the output layer we now have f(z) = z, that implies f'(z) = 1 and thus the above now simplifies to:
output结点
Of course, when using backpropagation to compute the error terms for the hidden layer:
hidden结点
Because the hidden layer is using a sigmoid (or tanh) activation f, in the equation above
should still be the derivative of the sigmoid (or tanh) function.
Linear Decoders的更多相关文章
- (六)6.16 Neurons Networks linear decoders and its implements
Sparse AutoEncoder是一个三层结构的网络,分别为输入输出与隐层,前边自编码器的描述可知,神经网络中的神经元都采用相同的激励函数,Linear Decoders 修改了自编码器的定义,对 ...
- CS229 6.16 Neurons Networks linear decoders and its implements
Sparse AutoEncoder是一个三层结构的网络,分别为输入输出与隐层,前边自编码器的描述可知,神经网络中的神经元都采用相同的激励函数,Linear Decoders 修改了自编码器的定义,对 ...
- Unsupervised Feature Learning and Deep Learning(UFLDL) Exercise 总结
7.27 暑假开始后,稍有时间,“搞完”金融项目,便开始跑跑 Deep Learning的程序 Hinton 在Nature上文章的代码 跑了3天 也没跑完 后来Debug 把batch 从200改到 ...
- Deep learning:三十八(Stacked CNN简单介绍)
http://www.cnblogs.com/tornadomeet/archive/2013/05/05/3061457.html 前言: 本节主要是来简单介绍下stacked CNN(深度卷积网络 ...
- [UFLDL] Basic Concept
博客内容取材于:http://www.cnblogs.com/tornadomeet/archive/2012/06/24/2560261.html 参考资料: UFLDL wiki UFLDL St ...
- [UFLDL] Dimensionality Reduction
博客内容取材于:http://www.cnblogs.com/tornadomeet/archive/2012/06/24/2560261.html Deep learning:三十五(用NN实现数据 ...
- Deep Learning 教程翻译
Deep Learning 教程翻译 非常激动地宣告,Stanford 教授 Andrew Ng 的 Deep Learning 教程,于今日,2013年4月8日,全部翻译成中文.这是中国屌丝军团,从 ...
- ICLR 2013 International Conference on Learning Representations深度学习论文papers
ICLR 2013 International Conference on Learning Representations May 02 - 04, 2013, Scottsdale, Arizon ...
- Deep Learning基础--线性解码器、卷积、池化
本文主要是学习下Linear Decoder已经在大图片中经常采用的技术convolution和pooling,分别参考网页http://deeplearning.stanford.edu/wiki/ ...
随机推荐
- Ubuntu14.04下tensorflow安装
自己电脑没装双系统,于是决定在虚拟机里装个tensorflow,以下是安装过程: 1.安装anaconda2 for Linux 官网下的话很慢,去清华的镜像网站下吧,我上一篇文章有网址 安装:bas ...
- caffe(14) python可视化
首先将caffe的根目录作为当前目录,然后加载caffe程序自带的小猫图片,并显示. 图片大小为360x480,三通道 In [1]: import numpy as np import matplo ...
- bzoj2763 [JLOI]飞行路线 分层图最短路
问题描述 Alice和Bob现在要乘飞机旅行,他们选择了一家相对便宜的航空公司.该航空公司一共在n个城市设有业务,设这些城市分别标记为0到n-1,一共有m种航线,每种航线连接两个城市,并且航线有一定的 ...
- Python对象引用的所有权
目录 引用所有权 传递引用的所有权--返回值 出借引用的所有权--返回值 占据引用的所有权--参数 出借引用的所有权--参数 引用所有权 谁持有对象引用的所有权,谁就要对对象负责. 引用的所有权对函数 ...
- RC Immix
目录 RC Immix 目的 合并型引用计数 伪代码 优点和缺点 合并型引用计数法和Immix的融合 新对象 被动的碎片整理 积极的碎片整理 优点和缺点 优点 缺点 RC Immix Rifat Sh ...
- Android后台进程与前台线程间的区别使用
博客出自:http://blog.csdn.net/liuxian13183,转载注明出处! All Rights Reserved ! 很早就翻译过Android API的一篇文章Android高级 ...
- 洛谷 P2440 木材加工
P2440 木材加工 题目背景 要保护环境 题目描述 题目描述: 木材厂有一些原木,现在想把这些木头切割成一些长度相同的小段木头(木头有可能有 剩余),需要得到的小段的数目是给定的.当然,我们希望得到 ...
- Java面向切面原理与实践
Java面向切面原理与实践 一. 面向切面编程是什么 首先用一句话概括:面向切面编程(AOP)就是对某些具有相似点的代码进行增强. 相似点可以是同一个包.使用相同的注解.public的方法.以Impl ...
- typedef 与 set_new_handler的几种写法
可以用Command模式.函数对象来代替函数指针,获得以下的好处: 1. 可以封装数据 2. 可以通过虚拟成员获得函数的多态性 3. 可以处理类层次结果,将Command与Prototype模式相结合 ...
- Java中发送http的get、post请求
近期做项目中,须要把消息通过中间件的形式通过http请求的方式推送给第三方,因此用到了http协议,小编花费了一个多小时.对于http协议中的post和get请求,封装了一个工具类.以下与大家分享一下 ...

is called the linear activation function。Note however that in the hidden layer of the network, we still use a sigmoid (or tanh) activation function, so that the hidden unit activations are given by (say)
, where
is the sigmoid function, x is the input, and W(1) andb(1) are the weight and bias terms for the hidden units. It is only in the output layer that we use the linear activation function.
. Because the output
is a now linear function of the hidden unit activations, by varying W(2), each output unit a(3) can be made to produce values greater than 1 or less than 0 as well. This allows us to train the sparse autoencoder real-valued inputs without needing to pre-scale every example to a specific range.
output结点
hidden结点
should still be the derivative of the sigmoid (or tanh) function.