An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses .

Here is an autoencoder:

The autoencoder tries to learn a function . In other words, it is trying to learn an approximation to the identity function, so as to output that is similar to . The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data.

例子&用途

As a concrete example, suppose the inputs are the pixel intensity values from a image (100 pixels) so , and there are hidden units in layer . Note that we also have . Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations , it must try to reconstruct the 100-pixel input . If the input were completely random---say, each comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs

约束

Our argument above relied on the number of hidden units being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.

Recall that denotes the activation of hidden unit in the autoencoder. However, this notation doesn't make explicit what was the input that led to that activation. Thus, we will write to denote the activation of this hidden unit when the network is given a specific input . Further, let

be the average activation of hidden unit (averaged over the training set). We would like to (approximately) enforce the constraint

where is a sparsity parameter, typically a small value close to zero (say ). In other words, we would like the average activation of each hidden neuron to be close to 0.05 (say). To satisfy this constraint, the hidden unit's activations must mostly be near 0.

To achieve this, we will add an extra penalty term to our optimization objective   that penalizes deviating significantly from . Many choices of the penalty term will give reasonable results. We will choose the following:

Here, is the number of neurons in the hidden layer, and the index is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written

where is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean and a Bernoulli random variable with mean . KL-divergence is a standard function for measuring how different two different distributions are.

偏离,惩罚

损失函数

无稀疏约束时网络的损失函数表达式如下:

带稀疏约束的损失函数如下:

where is as defined previously, and controls the weight of the sparsity penalty term. The term (implicitly) depends on also, because it is the average activation of hidden unit , and the activation of a hidden unit depends on the parameters .

损失函数的偏导数的求法

而加入了稀疏性后,神经元节点的误差表达式由公式:

变成公式:

梯度下降法求解

有了损失函数及其偏导数后就可以采用梯度下降法来求网络最优化的参数了,整个流程如下所示:

从上面的公式可以看出,损失函数的偏导其实是个累加过程,每来一个样本数据就累加一次。这是因为损失函数本身就是由每个训练样本的损失叠加而成的,而按照加法的求导法则,损失函数的偏导也应该是由各个训练样本所损失的偏导叠加而成。从这里可以看出,训练样本输入网络的顺序并不重要,因为每个训练样本所进行的操作是等价的,后面样本的输入所产生的结果并不依靠前一次输入结果(只是简单的累加而已,而这里的累加是顺序无关的)。

转自:http://www.cnblogs.com/tornadomeet/archive/2013/03/19/2970101.html

Autoencoders and Sparsity(一)的更多相关文章

  1. (六)6.4 Neurons Networks Autoencoders and Sparsity

    BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data:  ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...

  2. CS229 6.4 Neurons Networks Autoencoders and Sparsity

    BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data:  ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...

  3. Autoencoders and Sparsity(二)

    In this problem set, you will implement the sparse autoencoder algorithm, and show how it discovers ...

  4. 【DeepLearning】UFLDL tutorial错误记录

    (一)Autoencoders and Sparsity章节公式错误: s2 应为 s3. 意为从第2层(隐藏层)i节点到输出层j节点的误差加权和. (二)Support functions for ...

  5. Deep Learning 教程翻译

    Deep Learning 教程翻译 非常激动地宣告,Stanford 教授 Andrew Ng 的 Deep Learning 教程,于今日,2013年4月8日,全部翻译成中文.这是中国屌丝军团,从 ...

  6. 三层神经网络自编码算法推导和MATLAB实现 (转载)

    转载自:http://www.cnblogs.com/tornadomeet/archive/2013/03/20/2970724.html 前言: 现在来进入sparse autoencoder的一 ...

  7. DL二(稀疏自编码器 Sparse Autoencoder)

    稀疏自编码器 Sparse Autoencoder 一神经网络(Neural Networks) 1.1 基本术语 神经网络(neural networks) 激活函数(activation func ...

  8. Sparse Autoencoder(二)

    Gradient checking and advanced optimization In this section, we describe a method for numerically ch ...

  9. 【DeepLearning】Exercise:Learning color features with Sparse Autoencoders

    Exercise:Learning color features with Sparse Autoencoders 习题链接:Exercise:Learning color features with ...

随机推荐

  1. 51Nod 和为k的连续区间

    一整数数列a1, a2, ... , an(有正有负),以及另一个整数k,求一个区间[i, j],(1 <= i <= j <= n),使得a[i] + ... + a[j] = k ...

  2. Linux安装多功能词典GoldenDict

    Linux安装多功能词典GoldenDict 活腿肠 2017.08.01 20:52* 字数 671 阅读 1555评论 0喜欢 2 Goldendict 简介 GoldenDict是一种开源的辞典 ...

  3. BZOJ 3637: Query on a tree VI LCT_维护子树信息_点权转边权_好题

    非常喜欢这道题. 点权转边权,这样每次在切断一个点的所有儿子的时候只断掉一条边即可. Code: #include <cstring> #include <cstdio> #i ...

  4. Studio3T 破解,无需命令计划任务

    最近项目中使用到了MongoDB,苦于命令行不好操作,所以就寻觅了一下MongDB的GUI管理工具,最终找到了Studio3T,功能非常强大,但是苦于只有评估版本30天,最可气的是一时手贱,修改了系统 ...

  5. Java.Lang.NoSuchMethod 错误

    项目开发.调用webservice,方法调用报了 Java.Lang.NoSucheMethod..........,印象中记得是jar包冲突,maven项目,一看,这一堆jar包...用eclips ...

  6. 基于express+redis高速实现实时在线用户数统计

    作者:zhanhailiang 日期:2014-11-09 本文将介绍怎样基于express+redis高速实现实时在线用户数统计. 1. 在github.com上创建项目uv-tj.将其同步到本地: ...

  7. 有关c语言指针的总结

    #include<stdio.h> void main() { int a[3]={1,3,5};//一维数组 int *num[3]={&a[0],&a[1],& ...

  8. VC双缓冲画图技术介绍

    双缓冲画图,它是一种主要的图形图像画图技术.首先,它在内存中创建一个与屏幕画图区域一致的对象,然后将图形绘制到内存中的这个对象上,最后把这个对象上的图形数据一次性地拷贝并显示到屏幕上. 这样的技术能够 ...

  9. JVM-ClassLoader装载class的流程

    在JVM中,有三种默认的类加载器,分别为Bootstrap ClassLoader,Extension CLassLoader以及App ClassLoader.其中,Bootstrap Classl ...

  10. hdoj--5333--Dancing Stars on Me(水题)

    Dancing Stars on Me Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 262144/262144 K (Java/Ot ...