Autoencoders and Sparsity(一)
An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses
.
Here is an autoencoder:
![]()
The autoencoder tries to learn a function
. In other words, it is trying to learn an approximation to the identity function, so as to output
that is similar to
. The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data.
例子&用途
As a concrete example, suppose the inputs
are the pixel intensity values from a
image (100 pixels) so
, and there are
hidden units in layer
. Note that we also have
. Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations
, it must try to reconstruct the 100-pixel input
. If the input were completely random---say, each
comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs
约束
Our argument above relied on the number of hidden units
being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.
Recall that
denotes the activation of hidden unit
in the autoencoder. However, this notation doesn't make explicit what was the input
that led to that activation. Thus, we will write
to denote the activation of this hidden unit when the network is given a specific input
. Further, let
be the average activation of hidden unit
(averaged over the training set). We would like to (approximately) enforce the constraint
where
is a sparsity parameter, typically a small value close to zero (say
). In other words, we would like the average activation of each hidden neuron
to be close to 0.05 (say). To satisfy this constraint, the hidden unit's activations must mostly be near 0.
To achieve this, we will add an extra penalty term to our optimization objective that penalizes
deviating significantly from
. Many choices of the penalty term will give reasonable results. We will choose the following:
Here,
is the number of neurons in the hidden layer, and the index
is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written
where
is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean
and a Bernoulli random variable with mean
. KL-divergence is a standard function for measuring how different two different distributions are.
偏离,惩罚
损失函数
无稀疏约束时网络的损失函数表达式如下:
带稀疏约束的损失函数如下:
where
is as defined previously, and
controls the weight of the sparsity penalty term. The term
(implicitly) depends on
also, because it is the average activation of hidden unit
, and the activation of a hidden unit depends on the parameters
.
损失函数的偏导数的求法
而加入了稀疏性后,神经元节点的误差表达式由公式:
变成公式:
梯度下降法求解
有了损失函数及其偏导数后就可以采用梯度下降法来求网络最优化的参数了,整个流程如下所示:
从上面的公式可以看出,损失函数的偏导其实是个累加过程,每来一个样本数据就累加一次。这是因为损失函数本身就是由每个训练样本的损失叠加而成的,而按照加法的求导法则,损失函数的偏导也应该是由各个训练样本所损失的偏导叠加而成。从这里可以看出,训练样本输入网络的顺序并不重要,因为每个训练样本所进行的操作是等价的,后面样本的输入所产生的结果并不依靠前一次输入结果(只是简单的累加而已,而这里的累加是顺序无关的)。
转自:http://www.cnblogs.com/tornadomeet/archive/2013/03/19/2970101.html
Autoencoders and Sparsity(一)的更多相关文章
- (六)6.4 Neurons Networks Autoencoders and Sparsity
BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data: ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...
- CS229 6.4 Neurons Networks Autoencoders and Sparsity
BP算法是适合监督学习的,因为要计算损失函数,计算时y值又是必不可少的,现在假设有一系列的无标签train data: ,其中 ,autoencoders是一种无监督学习算法,它使用了本身作为标签以 ...
- Autoencoders and Sparsity(二)
In this problem set, you will implement the sparse autoencoder algorithm, and show how it discovers ...
- 【DeepLearning】UFLDL tutorial错误记录
(一)Autoencoders and Sparsity章节公式错误: s2 应为 s3. 意为从第2层(隐藏层)i节点到输出层j节点的误差加权和. (二)Support functions for ...
- Deep Learning 教程翻译
Deep Learning 教程翻译 非常激动地宣告,Stanford 教授 Andrew Ng 的 Deep Learning 教程,于今日,2013年4月8日,全部翻译成中文.这是中国屌丝军团,从 ...
- 三层神经网络自编码算法推导和MATLAB实现 (转载)
转载自:http://www.cnblogs.com/tornadomeet/archive/2013/03/20/2970724.html 前言: 现在来进入sparse autoencoder的一 ...
- DL二(稀疏自编码器 Sparse Autoencoder)
稀疏自编码器 Sparse Autoencoder 一神经网络(Neural Networks) 1.1 基本术语 神经网络(neural networks) 激活函数(activation func ...
- Sparse Autoencoder(二)
Gradient checking and advanced optimization In this section, we describe a method for numerically ch ...
- 【DeepLearning】Exercise:Learning color features with Sparse Autoencoders
Exercise:Learning color features with Sparse Autoencoders 习题链接:Exercise:Learning color features with ...
随机推荐
- Objective-C 小记(10)__weak
本文使用的 runtime 版本为 objc4-706. __weak 修饰的指针最重要的特性是其指向的对象销毁后,会自动置为 nil,这个特性的实现完全是依靠运行时的.实现思路是非常简单的,对于下面 ...
- JavaScript,ES5和ES6的区别
什么是JavaScript JavaScript一种动态类型.弱类型.基于原型的客户端脚本语言,用来给HTML网页增加动态功能.(好吧,概念什么最讨厌了) 动态: 在运行时确定数据类型.变量使用之前不 ...
- WSGI和CGI
https://www.zhihu.com/question/19998865 https://segmentfault.com/a/1190000003069785
- parted---磁盘分区
parted命令是由GNU组织开发的一款功能强大的磁盘分区和分区大小调整工具,与fdisk不同,它支持调整分区的大小.作为一种设计用于Linux的工具,它没有构建成处理与fdisk关联的多种分区类型, ...
- python-排序算法 冒泡和快速排序
交换排序 交换排序有冒泡排序和快速排序 冒泡排序 冒泡排序就是每次找出最大(最小)元素,放在集合最前或最后,这是最简单的排序算法 print("未排序之前:",collection ...
- UVALive 2659 数独 DLX模板
建图: 从1到16枚举所有的行.列上放的数. 代码: #include <iostream> #include <cstdio> #include <cstring> ...
- 洛谷 P2839 畅通工程
P2839 畅通工程 题目描述 某省调查城镇交通状况,得到现有城镇道路统计表,表中列出了每条道路直接连通的城镇.省政府“畅通工程”的目标是使全省任何两个城镇间都可以实现交通(但不一定有直接的道路相连, ...
- IDEA创建maven项目之后无法编写java类
在创建Maven web项目之后无法再java文件夹下面创建java类,这里我可以教一下大家 选择你的文件夹,鼠标点击右键,出现下图所显示的,你可以按照下图所显示的步骤进行操作
- Delphi新语法 For ..In
首先我们要知道哪些类型可以用For In吧,下面就是: for Element in ArrayExpr do Stmt; 数组 for Element in StringExpr do S ...
- 关于Javascript的forEach 和 map
本篇博客转载自 https://blog.fundebug.com/2018/02/05/map_vs_foreach/ 如果你已经有使用JavaScript的经验,你可能已经知道这两个看似相同的方法 ...
image (100 pixels) so
, and there are
hidden units in layer
. Note that we also have
. Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. I.e., given only the vector of hidden unit activations
, it must try to reconstruct the 100-pixel input
comes from an IID Gaussian independent of the other features---then this compression task would be very difficult. But if there is structure in the data, for example, if some of the input features are correlated, then this algorithm will be able to discover some of those correlations. In fact, this simple autoencoder often ends up learning a low-dimensional representation very similar to PCAs
being small. But even when the number of hidden units is large (perhaps even greater than the number of input pixels), we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data, even if the number of hidden units is large.
denotes the activation of hidden unit
in the autoencoder. However, this notation doesn't make explicit what was the input
to denote the activation of this hidden unit when the network is given a specific input 

is a sparsity parameter, typically a small value close to zero (say
). In other words, we would like the average activation of each hidden neuron
deviating significantly from 

is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean 

is as defined previously, and
controls the weight of the sparsity penalty term. The term
also, because it is the average activation of hidden unit 


