An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses .

Here is an autoencoder:

The autoencoder tries to learn a function . In other words, it is trying to learn an approximation to the identity function, so as to output that is similar to . The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data.

例子&用途

As a concrete example, suppose the inputs are the pixel intensity values from a image (100 pixels) so , and there are hidden units in layer . Note that we also have . Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations , it must try to reconstruct the 100-pixel input . If the input were completely random---say, each comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs

约束

Our argument above relied on the number of hidden units being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.

Recall that denotes the activation of hidden unit in the autoencoder. However, this notation doesn't make explicit what was the input that led to that activation. Thus, we will write to denote the activation of this hidden unit when the network is given a specific input . Further, let

be the average activation of hidden unit (averaged over the training set). We would like to (approximately) enforce the constraint

where is a sparsity parameter, typically a small value close to zero (say ). In other words, we would like the average activation of each hidden neuron to be close to 0.05 (say). To satisfy this constraint, the hidden unit's activations must mostly be near 0.

To achieve this, we will add an extra penalty term to our optimization objective   that penalizes deviating significantly from . Many choices of the penalty term will give reasonable results. We will choose the following:

Here, is the number of neurons in the hidden layer, and the index is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written

where is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean and a Bernoulli random variable with mean . KL-divergence is a standard function for measuring how different two different distributions are.

偏离,惩罚

损失函数

无稀疏约束时网络的损失函数表达式如下:

带稀疏约束的损失函数如下:

where is as defined previously, and controls the weight of the sparsity penalty term. The term (implicitly) depends on also, because it is the average activation of hidden unit , and the activation of a hidden unit depends on the parameters .

损失函数的偏导数的求法

而加入了稀疏性后,神经元节点的误差表达式由公式:

变成公式:

梯度下降法求解

有了损失函数及其偏导数后就可以采用梯度下降法来求网络最优化的参数了,整个流程如下所示:

从上面的公式可以看出,损失函数的偏导其实是个累加过程,每来一个样本数据就累加一次。这是因为损失函数本身就是由每个训练样本的损失叠加而成的,而按照加法的求导法则,损失函数的偏导也应该是由各个训练样本所损失的偏导叠加而成。从这里可以看出,训练样本输入网络的顺序并不重要,因为每个训练样本所进行的操作是等价的,后面样本的输入所产生的结果并不依靠前一次输入结果(只是简单的累加而已,而这里的累加是顺序无关的)。

转自:http://www.cnblogs.com/tornadomeet/archive/2013/03/19/2970101.html

Autoencoders and Sparsity(一)的更多相关文章

  1. (六)6.4 Neurons Networks Autoencoders and Sparsity

    BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data:  ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...

  2. CS229 6.4 Neurons Networks Autoencoders and Sparsity

    BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data:  ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...

  3. Autoencoders and Sparsity(二)

    In this problem set, you will implement the sparse autoencoder algorithm, and show how it discovers ...

  4. 【DeepLearning】UFLDL tutorial错误记录

    (一)Autoencoders and Sparsity章节公式错误: s2 应为 s3. 意为从第2层(隐藏层)i节点到输出层j节点的误差加权和. (二)Support functions for ...

  5. Deep Learning 教程翻译

    Deep Learning 教程翻译 非常激动地宣告,Stanford 教授 Andrew Ng 的 Deep Learning 教程,于今日,2013年4月8日,全部翻译成中文.这是中国屌丝军团,从 ...

  6. 三层神经网络自编码算法推导和MATLAB实现 (转载)

    转载自:http://www.cnblogs.com/tornadomeet/archive/2013/03/20/2970724.html 前言: 现在来进入sparse autoencoder的一 ...

  7. DL二(稀疏自编码器 Sparse Autoencoder)

    稀疏自编码器 Sparse Autoencoder 一神经网络(Neural Networks) 1.1 基本术语 神经网络(neural networks) 激活函数(activation func ...

  8. Sparse Autoencoder(二)

    Gradient checking and advanced optimization In this section, we describe a method for numerically ch ...

  9. 【DeepLearning】Exercise:Learning color features with Sparse Autoencoders

    Exercise:Learning color features with Sparse Autoencoders 习题链接:Exercise:Learning color features with ...

随机推荐

  1. js中字符串转驼峰转为下划线

    function dasherize(str) { return str.replace(/::/g, '/') .replace(/([A-Z]+)([A-Z][a-z])/g, '$1_$2') ...

  2. centos7 初始化安装

    CENTOS7 初装 一.分区 挂载路径 格式 容量 / xfs 102400 swap 等同内存大小 /home xfs 剩余 二.时区 Asia/Shanghai 三.安装包选择 选择最小化安装 ...

  3. SPOJ8222 NSUBSTR - Substrings 后缀自动机_动态规划

    讲起来不是特别好讲.总之,如果 $dp[i+1]>=dp[i]$,故$dp[i]=max(dp[i],dp[i+1])$ Code: #include <cstdio> #inclu ...

  4. How Javascript works (Javascript工作原理) (十四) 解析,语法抽象树及最小化解析时间的 5 条小技巧

    个人总结:读完这篇文章需要15分钟,文章介绍了抽象语法树与js引擎解析这些语法树的过程,提到了懒解析——即转换为AST的过程中不直接进入函数体解析,当这个函数体需要执行的时候才进行相应转换.(因为有的 ...

  5. yii2-Ueditor百度编辑器

    今天在网上看了下有关图片上传的教程,历经挫折才调试好,现在把相关代码及其说明贴出来,以供初次使用的朋友们参考. 资源下载 yii2.0-ueditor下载路径: https://link.jiansh ...

  6. numpy基础篇-简单入门教程2

    import numpy as np Array 数组 print(np.zeros((2, 2))) # [[0. 0.] [0. 0.]] print(np.ones((2, 2))) # [[1 ...

  7. Oracle基础入门(三)

    一:PLsql一些基本操作 调节plsql的字体大小 二:创建表,如果学过sql server的数据库就会发现其实Oracle跟的一些新建表和新增修改其实是差不多的 新建表 Create table ...

  8. ArcGIS api for javascript——明确的创建图层列表

    描述 本例展示了如何确切地创建一个地图服务里的图层列表.这个列表由HTML checkboxe组成,可用用于开关图层的可见性. 函数updateLayerVisibility()包含开关图层的逻辑.函 ...

  9. HDU 3555 Bomb(数位DP模板啊两种形式)

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=3555 Problem Description The counter-terrorists found ...

  10. Java中Socket上的Read操作堵塞问题

    从Socket上读取对端发过来的数据一般有两种方法: 1)依照字节流读取 BufferedInputStream in = new BufferedInputStream(socket.getInpu ...