Autoencoders and Sparsity(一)
An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses
.
Here is an autoencoder:
![]()
The autoencoder tries to learn a function
. In other words, it is trying to learn an approximation to the identity function, so as to output
that is similar to
. The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data.
例子&用途
As a concrete example, suppose the inputs
are the pixel intensity values from a
image (100 pixels) so
, and there are
hidden units in layer
. Note that we also have
. Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations
, it must try to reconstruct the 100-pixel input
. If the input were completely random---say, each
comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs
约束
Our argument above relied on the number of hidden units
being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.
Recall that
denotes the activation of hidden unit
in the autoencoder. However, this notation doesn't make explicit what was the input
that led to that activation. Thus, we will write
to denote the activation of this hidden unit when the network is given a specific input
. Further, let
be the average activation of hidden unit
(averaged over the training set). We would like to (approximately) enforce the constraint
where
is a sparsity parameter, typically a small value close to zero (say
). In other words, we would like the average activation of each hidden neuron
to be close to 0.05 (say). To satisfy this constraint, the hidden unit's activations must mostly be near 0.
To achieve this, we will add an extra penalty term to our optimization objective that penalizes
deviating significantly from
. Many choices of the penalty term will give reasonable results. We will choose the following:
Here,
is the number of neurons in the hidden layer, and the index
is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written
where
is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean
and a Bernoulli random variable with mean
. KL-divergence is a standard function for measuring how different two different distributions are.
偏离,惩罚
损失函数
无稀疏约束时网络的损失函数表达式如下:
带稀疏约束的损失函数如下:
where
is as defined previously, and
controls the weight of the sparsity penalty term. The term
(implicitly) depends on
also, because it is the average activation of hidden unit
, and the activation of a hidden unit depends on the parameters
.
损失函数的偏导数的求法
而加入了稀疏性后,神经元节点的误差表达式由公式:
变成公式:
梯度下降法求解
有了损失函数及其偏导数后就可以采用梯度下降法来求网络最优化的参数了,整个流程如下所示:
从上面的公式可以看出,损失函数的偏导其实是个累加过程,每来一个样本数据就累加一次。这是因为损失函数本身就是由每个训练样本的损失叠加而成的,而按照加法的求导法则,损失函数的偏导也应该是由各个训练样本所损失的偏导叠加而成。从这里可以看出,训练样本输入网络的顺序并不重要,因为每个训练样本所进行的操作是等价的,后面样本的输入所产生的结果并不依靠前一次输入结果(只是简单的累加而已,而这里的累加是顺序无关的)。
转自:http://www.cnblogs.com/tornadomeet/archive/2013/03/19/2970101.html
Autoencoders and Sparsity(一)的更多相关文章
- (六)6.4 Neurons Networks Autoencoders and Sparsity
BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data: ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...
- CS229 6.4 Neurons Networks Autoencoders and Sparsity
BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data: ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...
- Autoencoders and Sparsity(二)
In this problem set, you will implement the sparse autoencoder algorithm, and show how it discovers ...
- 【DeepLearning】UFLDL tutorial错误记录
(一)Autoencoders and Sparsity章节公式错误: s2 应为 s3. 意为从第2层(隐藏层)i节点到输出层j节点的误差加权和. (二)Support functions for ...
- Deep Learning 教程翻译
Deep Learning 教程翻译 非常激动地宣告,Stanford 教授 Andrew Ng 的 Deep Learning 教程,于今日,2013年4月8日,全部翻译成中文.这是中国屌丝军团,从 ...
- 三层神经网络自编码算法推导和MATLAB实现 (转载)
转载自:http://www.cnblogs.com/tornadomeet/archive/2013/03/20/2970724.html 前言: 现在来进入sparse autoencoder的一 ...
- DL二(稀疏自编码器 Sparse Autoencoder)
稀疏自编码器 Sparse Autoencoder 一神经网络(Neural Networks) 1.1 基本术语 神经网络(neural networks) 激活函数(activation func ...
- Sparse Autoencoder(二)
Gradient checking and advanced optimization In this section, we describe a method for numerically ch ...
- 【DeepLearning】Exercise:Learning color features with Sparse Autoencoders
Exercise:Learning color features with Sparse Autoencoders 习题链接:Exercise:Learning color features with ...
随机推荐
- flex属性的取值
首先明确一点是, flex 是 flex-grow.flex-shrink.flex-basis的缩写.故其取值可以考虑以下情况:flex 的默认值是以上三个属性值的组合.假设以上三个属性同样取默认值 ...
- Leaflet绘制多边形
drawPolygon = () => { let points = []; const polygon = new L.Polygon(points); this.map.addLayer(p ...
- ES6学习笔记(十三)Iterator遍历器和for...of循环
1.概念 遍历器(Iterator)就是这样一种机制.它是一种接口,为各种不同的数据结构提供统一的访问机制.任何数据结构只要部署 Iterator 接口,就可以完成遍历操作(即依次处理该数据结构的所有 ...
- POJ1201Intervals(差分约束)
题意 给出数轴上的n个区间[ai,bi],每个区间都是连续的int区间. 现在要在数轴上任意取一堆元素,构成一个元素集合V 要求每个区间[ai,bi]和元素集合V的交集至少有ci不同的元素 求集合V最 ...
- .NET 将 .config 文件嵌入到程序集
原文:.NET 将 .config 文件嵌入到程序集 版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/Iron_Ye/article/details/ ...
- 【Henu ACM Round#19 B】 Luxurious Houses
[链接] 我是链接,点我呀:) [题意] 在这里输入题意 [题解] 从右往左维护最大值. 看到比最大值小(或等于)的话.就递增到比最大值大1就好. [代码] #include <bits/std ...
- Android——bootchart
bootchart:android原生自带的开机性能查看机制.通过收集android开机过程中的各种log数据,终于能够图表的形式展现各个进程在开机过程中的性能.(博客不能断-) 撰写不易,转载需注明 ...
- UVa 112 树求和
题意:给定一个数字,以及一个描写叙述树的字符序列,问存不存在一条从根到某叶子结点的路径使得其和等于那个数. 难点在于怎样处理字符序列.由于字符间可能有空格.换行等. 思路:本来想着用scanf的(后发 ...
- vue10 v-text v-html
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
- caioj1443:第k小的数Ⅲ
[传送门:caioj1443] 简要题意: 给出一颗n个点的树,给出每个点的权值,再给出n-1条边,有m个询问,每个询问输入x,y,k,输出第x节点到第y节点的路径上第k大的点 题解: 这是一道主席树 ...
image (100 pixels) so
, and there are
hidden units in layer
. Note that we also have
. Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations
, it must try to reconstruct the 100-pixel input
comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs
being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.
denotes the activation of hidden unit
in the autoencoder. However, this notation doesn't make explicit what was the input
to denote the activation of this hidden unit when the network is given a specific input 

is a sparsity parameter, typically a small value close to zero (say
). In other words, we would like the average activation of each hidden neuron
deviating significantly from 

is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean 

is as defined previously, and
controls the weight of the sparsity penalty term. The term
also, because it is the average activation of hidden unit 


