Autoencoders and Sparsity(一)
An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses
.
Here is an autoencoder:
![]()
The autoencoder tries to learn a function
. In other words, it is trying to learn an approximation to the identity function, so as to output
that is similar to
. The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data.
例子&用途
As a concrete example, suppose the inputs
are the pixel intensity values from a
image (100 pixels) so
, and there are
hidden units in layer
. Note that we also have
. Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations
, it must try to reconstruct the 100-pixel input
. If the input were completely random---say, each
comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs
约束
Our argument above relied on the number of hidden units
being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.
Recall that
denotes the activation of hidden unit
in the autoencoder. However, this notation doesn't make explicit what was the input
that led to that activation. Thus, we will write
to denote the activation of this hidden unit when the network is given a specific input
. Further, let
be the average activation of hidden unit
(averaged over the training set). We would like to (approximately) enforce the constraint
where
is a sparsity parameter, typically a small value close to zero (say
). In other words, we would like the average activation of each hidden neuron
to be close to 0.05 (say). To satisfy this constraint, the hidden unit's activations must mostly be near 0.
To achieve this, we will add an extra penalty term to our optimization objective that penalizes
deviating significantly from
. Many choices of the penalty term will give reasonable results. We will choose the following:
Here,
is the number of neurons in the hidden layer, and the index
is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written
where
is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean
and a Bernoulli random variable with mean
. KL-divergence is a standard function for measuring how different two different distributions are.
偏离,惩罚
损失函数
无稀疏约束时网络的损失函数表达式如下:
带稀疏约束的损失函数如下:
where
is as defined previously, and
controls the weight of the sparsity penalty term. The term
(implicitly) depends on
also, because it is the average activation of hidden unit
, and the activation of a hidden unit depends on the parameters
.
损失函数的偏导数的求法
而加入了稀疏性后,神经元节点的误差表达式由公式:
变成公式:
梯度下降法求解
有了损失函数及其偏导数后就可以采用梯度下降法来求网络最优化的参数了,整个流程如下所示:
从上面的公式可以看出,损失函数的偏导其实是个累加过程,每来一个样本数据就累加一次。这是因为损失函数本身就是由每个训练样本的损失叠加而成的,而按照加法的求导法则,损失函数的偏导也应该是由各个训练样本所损失的偏导叠加而成。从这里可以看出,训练样本输入网络的顺序并不重要,因为每个训练样本所进行的操作是等价的,后面样本的输入所产生的结果并不依靠前一次输入结果(只是简单的累加而已,而这里的累加是顺序无关的)。
转自:http://www.cnblogs.com/tornadomeet/archive/2013/03/19/2970101.html
Autoencoders and Sparsity(一)的更多相关文章
- (六)6.4 Neurons Networks Autoencoders and Sparsity
BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data: ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...
- CS229 6.4 Neurons Networks Autoencoders and Sparsity
BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data: ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...
- Autoencoders and Sparsity(二)
In this problem set, you will implement the sparse autoencoder algorithm, and show how it discovers ...
- 【DeepLearning】UFLDL tutorial错误记录
(一)Autoencoders and Sparsity章节公式错误: s2 应为 s3. 意为从第2层(隐藏层)i节点到输出层j节点的误差加权和. (二)Support functions for ...
- Deep Learning 教程翻译
Deep Learning 教程翻译 非常激动地宣告,Stanford 教授 Andrew Ng 的 Deep Learning 教程,于今日,2013年4月8日,全部翻译成中文.这是中国屌丝军团,从 ...
- 三层神经网络自编码算法推导和MATLAB实现 (转载)
转载自:http://www.cnblogs.com/tornadomeet/archive/2013/03/20/2970724.html 前言: 现在来进入sparse autoencoder的一 ...
- DL二(稀疏自编码器 Sparse Autoencoder)
稀疏自编码器 Sparse Autoencoder 一神经网络(Neural Networks) 1.1 基本术语 神经网络(neural networks) 激活函数(activation func ...
- Sparse Autoencoder(二)
Gradient checking and advanced optimization In this section, we describe a method for numerically ch ...
- 【DeepLearning】Exercise:Learning color features with Sparse Autoencoders
Exercise:Learning color features with Sparse Autoencoders 习题链接:Exercise:Learning color features with ...
随机推荐
- HDU I Hate It(线段树单节点更新,求区间最值)
http://acm.hdu.edu.cn/showproblem.php?pid=1754 Problem Description 很多学校流行一种比较的习惯.老师们很喜欢询问,从某某到某某当中,分 ...
- 【Git 四】一款不错的 Git 客户端
平常做开发使用 git bash 进行代码提交,一直没有使用过 git 相关的客户端. 直到有次同一分支下两个日志进行代码比较时,bash 返回的结果可视化理解起来比较差. 如果更改的部分比较多,问题 ...
- fuser ---显示出当前程序使用磁盘上的某个文件
fuser 可以显示出当前哪个程序在使用磁盘上的某个文件.挂载点.甚至网络端口,并给出程序进程的详细信息. fuser只把PID输出到标准输出,其他的都输出到标准错误输出. a 显示所有命令行中指定的 ...
- free---显示内存
free命令可以显示当前系统未使用的和已使用的内存数目,还可以显示被内核使用的内存缓冲区. 语法 free(选项) 选项 -b:以Byte为单位显示内存使用情况: -k:以KB为单位显示内存使用情况: ...
- Unity 如何将apk放到Android系统的system里
有时我们需要用unity开发一款Android的系统软件,很坑,步骤如下: 1.用unity打包出来,签名. 2.用解压工具打开签过名的apk. 3.将lib里面的.so文件复制出来. 4.adb r ...
- 一种基于RBAC模型的动态访问控制改进方法
本发明涉及一种基于RBAC模型的动态访问控制改进方法,属于访问控制领域.对原有RBAC模型进行了权限的改进和约束条件的改进,具体为将权限分为静态权限和动态权限,其中静态权限是非工作流的权限,动态权限是 ...
- NYOJ 541 最强的战斗力
最强DE 战斗力 时间限制:1000 ms | 内存限制:65535 KB 难度: 描写叙述 春秋战国时期,赵国地大物博,资源很丰富.人民安居乐业.但很多国家对它虎视眈眈.准备联合起来对赵国发起一 ...
- springMVC的一些配置解析
<mvc:annotation-driven /> <!-- 启动注解驱动的Spring MVC功能,注册请求url和注解POJO类方法的映射--> 是一种简写形式,完全可以手 ...
- Aizu - 2564 Tree Reconstruction 并查集
Aizu - 2564 Tree Reconstruction 题意:一个有向图,要使得能确定每一条边的权值,要求是每个点的入权和出权相等,问你最少需要确定多少条边 思路:这题好像有一个定理之类的,对 ...
- Python正则表达式初识(九)
继续分享Python正则表达式的基础知识,今天给大家分享的特殊字符是[\u4E00-\u9FA5],这个特殊字符最好能够记下来,如果记不得的话通过百度也是可以一下子查到的. 该特殊字符是固定的写法,其 ...
image (100 pixels) so
, and there are
hidden units in layer
. Note that we also have
. Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations
, it must try to reconstruct the 100-pixel input
comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs
being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.
denotes the activation of hidden unit
in the autoencoder. However, this notation doesn't make explicit what was the input
to denote the activation of this hidden unit when the network is given a specific input 

is a sparsity parameter, typically a small value close to zero (say
). In other words, we would like the average activation of each hidden neuron
deviating significantly from 

is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean 

is as defined previously, and
controls the weight of the sparsity penalty term. The term
also, because it is the average activation of hidden unit 


