Sparse Autoencoder(二)
Gradient checking and advanced optimization
In this section, we describe a method for numerically checking the derivatives computed by your code to make sure that your implementation is correct. Carrying out the derivative checking procedure described here will significantly increase your confidence in the correctness of your code.
Suppose we want to minimize
as a function of
. For this example, suppose
, so that
. In this 1-dimensional case, one iteration of gradient descent is given by
Suppose also that we have implemented some function
that purportedly computes
, so that we implement gradient descent using the update
.
Recall the mathematical definition of the derivative as
Thus, at any specific value of
, we can numerically approximate the derivative as follows:
Thus, given a function
that is supposedly computing
, we can now numerically verify its correctness by checking that
The degree to which these two values should approximate each other will depend on the details of
. But assuming
, you'll usually find that the left- and right-hand sides of the above will agree to at least 4 significant digits (and often many more).
Suppose we have a function
that purportedly computes
; we'd like to check if
is outputting correct derivative values. Let
, where
is the
-th basis vector (a vector of the same dimension as
, with a "1" in the
-th position and "0"s everywhere else). So,
is the same as
, except its
-th element has been incremented by EPSILON. Similarly, let
be the corresponding vector with the
-th element decreased by EPSILON. We can now numerically verify
's correctness by checking, for each
, that:
参数为向量,为了验证每一维的计算正确性,可以控制其他变量
When implementing backpropagation to train a neural network, in a correct implementation we will have that
This result shows that the final block of psuedo-code in Backpropagation Algorithm is indeed implementing gradient descent. To make sure your implementation of gradient descent is correct, it is usually very helpful to use the method described above to numerically compute the derivatives of
, and thereby verify that your computations of
and
are indeed giving the derivatives you want.
Autoencoders and Sparsity
Anautoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses
.
Here is an autoencoder:
we will write
to denote the activation of this hidden unit when the network is given a specific input
. Further, let
be the average activation of hidden unit
(averaged over the training set). We would like to (approximately) enforce the constraint
where
is a sparsity parameter, typically a small value close to zero (say
). In other words, we would like the average activation of each hidden neuron
to be close to 0.05 (say). To satisfy this constraint, the hidden unit's activations must mostly be near 0.
To achieve this, we will add an extra penalty term to our optimization objective that penalizes
deviating significantly from
. Many choices of the penalty term will give reasonable results. We will choose the following:
Here,
is the number of neurons in the hidden layer, and the index
is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written
Our overall cost function is now
where
is as defined previously, and
controls the weight of the sparsity penalty term. The term
(implicitly) depends on
also, because it is the average activation of hidden unit
, and the activation of a hidden unit depends on the parameters
.
Visualizing a Trained Autoencoder
Consider the case of training an autoencoder on
images, so that
. Each hidden unit
computes a function of the input:
We will visualize the function computed by hidden unit
---which depends on the parameters
(ignoring the bias term for now)---using a 2D image. In particular, we think of
as some non-linear feature of the input
If we suppose that the input is norm constrained by
, then one can show (try doing this yourself) that the input which maximally activates hidden unit
is given by setting pixel
(for all 100 pixels,
) to
By displaying the image formed by these pixel intensity values, we can begin to understand what feature hidden unit
is looking for.
对一幅图像进行Autoencoder ,前面的隐藏结点一般捕获的是边缘等初级特征,越靠后隐藏结点捕获的特征语义更深。
Sparse Autoencoder(二)的更多相关文章
- DL二(稀疏自编码器 Sparse Autoencoder)
稀疏自编码器 Sparse Autoencoder 一神经网络(Neural Networks) 1.1 基本术语 神经网络(neural networks) 激活函数(activation func ...
- Deep Learning 1_深度学习UFLDL教程:Sparse Autoencoder练习(斯坦福大学深度学习教程)
1前言 本人写技术博客的目的,其实是感觉好多东西,很长一段时间不动就会忘记了,为了加深学习记忆以及方便以后可能忘记后能很快回忆起自己曾经学过的东西. 首先,在网上找了一些资料,看见介绍说UFLDL很不 ...
- (六)6.5 Neurons Networks Implements of Sparse Autoencoder
一大波matlab代码正在靠近.- -! sparse autoencoder的一个实例练习,这个例子所要实现的内容大概如下:从给定的很多张自然图片中截取出大小为8*8的小patches图片共1000 ...
- UFLDL实验报告2:Sparse Autoencoder
Sparse Autoencoder稀疏自编码器实验报告 1.Sparse Autoencoder稀疏自编码器实验描述 自编码神经网络是一种无监督学习算法,它使用了反向传播算法,并让目标值等于输入值, ...
- 七、Sparse Autoencoder介绍
目前为止,我们已经讨论了神经网络在有监督学习中的应用.在有监督学习中,训练样本是有类别标签的.现在假设我们只有一个没有带类别标签的训练样本集合 ,其中 .自编码神经网络是一种无监督学习算法,它使用 ...
- CS229 6.5 Neurons Networks Implements of Sparse Autoencoder
sparse autoencoder的一个实例练习,这个例子所要实现的内容大概如下:从给定的很多张自然图片中截取出大小为8*8的小patches图片共10000张,现在需要用sparse autoen ...
- 【DeepLearning】Exercise:Sparse Autoencoder
Exercise:Sparse Autoencoder 习题的链接:Exercise:Sparse Autoencoder 注意点: 1.训练样本像素值需要归一化. 因为输出层的激活函数是logist ...
- Sparse AutoEncoder简介
1. AutoEncoder AutoEncoder是一种特殊的三层神经网络, 其输出等于输入:\(y^{(i)}=x^{(i)}\), 如下图所示: 亦即AutoEncoder想学到的函数为\(f_ ...
- Exercise:Sparse Autoencoder
斯坦福deep learning教程中的自稀疏编码器的练习,主要是参考了 http://www.cnblogs.com/tornadomeet/archive/2013/03/20/2970724 ...
随机推荐
- day01-Python介绍,安装,idea
一. python 简介 Python,读作['paɪθɑn],翻译成汉语是蟒蛇的意思,Python 的 logo 也是两条缠绕在一起的蟒蛇的样子,然而 Python 语言和蟒蛇实际上并没有一毛钱关系 ...
- 设置多行文本框不能拓展大小和span标签边框设置
resize: none;/*设置多行文本框,不能拓展大小*/ #span { display: block; border: 1px solid RGB(169,169,169); /* span标 ...
- [Bug]Python3.x AttributeError: libtest.so: undefined symbol: fact
写kNN,需要在python中实现kd-tree 思考了一下,在python下写这种算法类的东西,还是十分别扭 于是希望用ctypes调用一下c++动态加载库 于是尝试实现一下 // test.cpp ...
- dd---复制文件并对原文件的内容进行转换和格式化处理
dd命令用于复制文件并对原文件的内容进行转换和格式化处理.dd命令功能很强大的,对于一些比较底层的问题,使用dd命令往往可以得到出人意料的效果.用的比较多的还是用dd来备份裸设备.但是不推荐,如果需要 ...
- 一个Web报表项目的性能分析和优化实践(一):小试牛刀,统一显示SQL语句执行时间
最近,在开发和优化一个报表型的Web项目,底层是Hibernate和MySQL. 当报表数据量大的时候,一个图表要花4秒以上的时间. 以下是我的分析和体会. 1.我首先需要知道哪些函数执行了多少时间 ...
- HDU 5068 Harry And Math Teacher 线段树+矩阵乘法
题意: 一栋楼有n层,每一层有2个门,每层的两个门和下一层之间的两个门之间各有一条路(共4条). 有两种操作: 0 x y : 输出第x层到第y层的路径数量. 1 x y z : 改变第x层 的 y门 ...
- ORA-12514: TNS: 监听程序当前无法识别连接描写叙述符中请求的服务
不指定数据库能够正常连接: 指定数据库和使用PL/SQL Developer都出现错误: 在此说明一下我的环境:Oralce装的是64位的在使用PL/SQL Developer时曾出现过初始化错误,解 ...
- Android布局文件的载入过程分析:Activity.setContentView()源代码分析
大家都知道在Activity的onCreate()中调用Activity.setContent()方法能够载入布局文件以设置该Activity的显示界面.本文将从setContentView()的源代 ...
- cocos2d-x 3.1 学习(一):工具安装与配置环境
初级学习cocos2d-x 3.1开发,学习开发过程记录到博客上面来,哪写的不正确请指点. 1.工具安装 cocos2d-x 3.1rc0 最新版本号,下载后解压.下载地址:http://www.co ...
- Mesh BRep Shapes
Mesh BRep Shapes eryar@163.com Abstract. 当对OpenCASCADE的BRep表示法的数据结构有了一定的理解后,建议可以自己实现一个显示数据生成的功能,即网格剖 ...
as a function of
. For this example, suppose
, so that
. In this 1-dimensional case, one iteration of gradient descent is given by
that purportedly computes
, so that we implement gradient descent using the update
.


. But assuming
, you'll usually find that the left- and right-hand sides of the above will agree to at least 4 significant digits (and often many more).
that purportedly computes
; we'd like to check if
is outputting correct derivative values. Let
, where
-th basis vector (a vector of the same dimension as
is the same as
be the corresponding vector with the 

, and thereby verify that your computations of
and
are indeed giving the derivatives you want.
.
to denote the activation of this hidden unit when the network is given a specific input
. Further, let
(averaged over the training set). We would like to (approximately) enforce the constraint
is a sparsity parameter, typically a small value close to zero (say
). In other words, we would like the average activation of each hidden neuron
deviating significantly from 
is the number of neurons in the hidden layer, and the index 

controls the weight of the sparsity penalty term. The term
also, because it is the average activation of hidden unit 
images, so that
. Each hidden unit 
(ignoring the bias term for now)---using a 2D image. In particular, we think of
as some non-linear feature of the input
, then one can show (try doing this yourself) that the input which maximally activates hidden unit
(for all 100 pixels,
) to