Gradient checking and advanced optimization

In this section, we describe a method for numerically checking the derivatives computed by your code to make sure that your implementation is correct. Carrying out the derivative checking procedure described here will significantly increase your confidence in the correctness of your code.

Suppose we want to minimize as a function of . For this example, suppose , so that . In this 1-dimensional case, one iteration of gradient descent is given by

Suppose also that we have implemented some function that purportedly computes , so that we implement gradient descent using the update .

Recall the mathematical definition of the derivative as

Thus, at any specific value of , we can numerically approximate the derivative as follows:

Thus, given a function that is supposedly computing , we can now numerically verify its correctness by checking that

The degree to which these two values should approximate each other will depend on the details of . But assuming , you'll usually find that the left- and right-hand sides of the above will agree to at least 4 significant digits (and often many more).

Suppose we have a function that purportedly computes ; we'd like to check if is outputting correct derivative values. Let , where

is the -th basis vector (a vector of the same dimension as , with a "1" in the -th position and "0"s everywhere else). So, is the same as , except its -th element has been incremented by EPSILON. Similarly, let be the corresponding vector with the -th element decreased by EPSILON. We can now numerically verify 's correctness by checking, for each , that:

参数为向量,为了验证每一维的计算正确性,可以控制其他变量

When implementing backpropagation to train a neural network, in a correct implementation we will have that

This result shows that the final block of psuedo-code in Backpropagation Algorithm is indeed implementing gradient descent. To make sure your implementation of gradient descent is correct, it is usually very helpful to use the method described above to numerically compute the derivatives of , and thereby verify that your computations of and are indeed giving the derivatives you want.

Autoencoders and Sparsity

Anautoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses .

Here is an autoencoder:

we will write to denote the activation of this hidden unit when the network is given a specific input . Further, let

be the average activation of hidden unit (averaged over the training set). We would like to (approximately) enforce the constraint

where is a sparsity parameter, typically a small value close to zero (say ). In other words, we would like the average activation of each hidden neuron to be close to 0.05 (say). To satisfy this constraint, the hidden unit's activations must mostly be near 0.

To achieve this, we will add an extra penalty term to our optimization objective that penalizes deviating significantly from . Many choices of the penalty term will give reasonable results. We will choose the following:

Here, is the number of neurons in the hidden layer, and the index is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written

Our overall cost function is now

where is as defined previously, and controls the weight of the sparsity penalty term. The term (implicitly) depends on also, because it is the average activation of hidden unit , and the activation of a hidden unit depends on the parameters .

Visualizing a Trained Autoencoder

Consider the case of training an autoencoder on images, so that . Each hidden unit computes a function of the input:

We will visualize the function computed by hidden unit ---which depends on the parameters (ignoring the bias term for now)---using a 2D image. In particular, we think of as some non-linear feature of the input

If we suppose that the input is norm constrained by , then one can show (try doing this yourself) that the input which maximally activates hidden unit is given by setting pixel (for all 100 pixels, ) to

By displaying the image formed by these pixel intensity values, we can begin to understand what feature hidden unit is looking for.

对一幅图像进行Autoencoder ,前面的隐藏结点一般捕获的是边缘等初级特征,越靠后隐藏结点捕获的特征语义更深。

Sparse Autoencoder(二)的更多相关文章

  1. DL二(稀疏自编码器 Sparse Autoencoder)

    稀疏自编码器 Sparse Autoencoder 一神经网络(Neural Networks) 1.1 基本术语 神经网络(neural networks) 激活函数(activation func ...

  2. Deep Learning 1_深度学习UFLDL教程:Sparse Autoencoder练习(斯坦福大学深度学习教程)

    1前言 本人写技术博客的目的,其实是感觉好多东西,很长一段时间不动就会忘记了,为了加深学习记忆以及方便以后可能忘记后能很快回忆起自己曾经学过的东西. 首先,在网上找了一些资料,看见介绍说UFLDL很不 ...

  3. (六)6.5 Neurons Networks Implements of Sparse Autoencoder

    一大波matlab代码正在靠近.- -! sparse autoencoder的一个实例练习,这个例子所要实现的内容大概如下:从给定的很多张自然图片中截取出大小为8*8的小patches图片共1000 ...

  4. UFLDL实验报告2:Sparse Autoencoder

    Sparse Autoencoder稀疏自编码器实验报告 1.Sparse Autoencoder稀疏自编码器实验描述 自编码神经网络是一种无监督学习算法,它使用了反向传播算法,并让目标值等于输入值, ...

  5. 七、Sparse Autoencoder介绍

    目前为止,我们已经讨论了神经网络在有监督学习中的应用.在有监督学习中,训练样本是有类别标签的.现在假设我们只有一个没有带类别标签的训练样本集合  ,其中  .自编码神经网络是一种无监督学习算法,它使用 ...

  6. CS229 6.5 Neurons Networks Implements of Sparse Autoencoder

    sparse autoencoder的一个实例练习,这个例子所要实现的内容大概如下:从给定的很多张自然图片中截取出大小为8*8的小patches图片共10000张,现在需要用sparse autoen ...

  7. 【DeepLearning】Exercise:Sparse Autoencoder

    Exercise:Sparse Autoencoder 习题的链接:Exercise:Sparse Autoencoder 注意点: 1.训练样本像素值需要归一化. 因为输出层的激活函数是logist ...

  8. Sparse AutoEncoder简介

    1. AutoEncoder AutoEncoder是一种特殊的三层神经网络, 其输出等于输入:\(y^{(i)}=x^{(i)}\), 如下图所示: 亦即AutoEncoder想学到的函数为\(f_ ...

  9. Exercise:Sparse Autoencoder

    斯坦福deep learning教程中的自稀疏编码器的练习,主要是参考了   http://www.cnblogs.com/tornadomeet/archive/2013/03/20/2970724 ...

随机推荐

  1. 机器学习(四) 机器学习(四) 分类算法--K近邻算法 KNN (下)

    六.网格搜索与 K 邻近算法中更多的超参数 七.数据归一化 Feature Scaling 解决方案:将所有的数据映射到同一尺度 八.scikit-learn 中的 Scaler preprocess ...

  2. PostgreSQL Replication之第二章 理解PostgreSQL的事务日志(2)

    2.2 XLOG和复制 在本章中,您已经了解到PostgreSQL的事务日志已经对数据库做了所有的更改.事务日志本身被打包为易用的16MB段. 使用这种更改集来复制数据的想法是不牵强的.事实上,这是在 ...

  3. java中移位操作

    /** * * @author SunRain *2013-10-14 8:09:50 *在最后一个移位运算中,结果没有直接付给b,而是直接打印出来,所以结果是正确的, *其他的是会被先转换成int型 ...

  4. vue+element的表格分页和前端搜索

    1.前端后台管理会存在很多表格,表格数据过多就需要分页;2.前端交互每次搜索如果都请求服务器会加大服务器的压力,所以在数据量不是很大的情况下可以一次性将数据返回,前端做检索3.下面贴上一个demo & ...

  5. java回调方法、钩子方法以及模板方法模式

    在面向对象的语言中,回调则是通过接口或抽象类来实现的,我们把实现这种接口的类称为回调类,回调类的对象称为回调对象,其处理事件的方法叫做回调方法.(摘自百度百科) 那么通过上面那句话将百度百科中的&qu ...

  6. Vmware qemu-kvm 虚拟化測试

    [root@kvm1 cloud]# lsmod | grep kvm kvm_intel 55496 3 kvm 337772 1 kvm_intel [root@kvm1 cloud]# egre ...

  7. uestc 94(区间更新)

    题意:有一个字符串全部由'('和')'组成.然后有三种操作,query a b输出区间[a,b]字符串的括号序列是否合法.reverse a b把区间[a,b]字符串里全部'('替换成')',而且把全 ...

  8. CSS3绘制砖墙-没实用不论什么图片

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ ...

  9. js插件---瀑布流Masonry

    js插件---瀑布流Masonry 一.总结 一句话总结:还是要去看官网,比amazeui上面介绍的详细很多 1.瀑布流的原理是什么? 给外层套好相对定位,里面的每一个弄好绝对定位,然后计算出每一个的 ...

  10. js插件---图片懒加载lazyload

    js插件---图片懒加载lazyload 一.总结 一句话总结:使用异常简单,src里面放加载的图片,data-original里面放原图片,不懂的位置去官网或者github找API就好. 1.laz ...