CS229 6.16 Neurons Networks linear decoders and its implements

Sparse AutoEncoder是一个三层结构的网络，分别为输入输出与隐层，前边自编码器的描述可知，神经网络中的神经元都采用相同的激励函数，Linear Decoders 修改了自编码器的定义，对输出层与隐层采用了不用的激励函数，所以 Linear Decoder 得到的模型更容易应用，而且对模型的参数变化有更高的鲁棒性。

在网络中的前向传导过程中的公式：

其中 a⁽³⁾ 是输出. 在自编码器中, a⁽³⁾ 近似重构了输入 x = a⁽¹⁾。

对于最后一层为 sigmod(tanh) 激活函数的 autoencoder ，会直接将数据归一化到 [0,1] ，所以当 f(z⁽³⁾) 采用 sigmod(tanh) 函数时，就要对输入限制或缩放，使其位于 [0,1] 范围中。但是对于输入数据 x ，比如 MNIST，但是很难满足 x 也在 [0,1] 的要求。比如， PCA 白化处理的输入并不满足 [0,1] 范围要求。

另 a⁽³⁾ = z⁽³⁾ 可以很简单的解决上述问题。即在输出端使用恒等函数 f(z) = z 作为激励函数，于是有 a⁽³⁾ = f(z⁽³⁾) = z⁽³⁾。该特殊的激励函数叫做 线性激励 (恒等激励)函数。

Linear Decoder 中隐含层的神经元依然使用 sigmod（tanh）激励函数。隐含单元的激励公式为 ,其中是 S 型函数, x 是入, W⁽¹⁾ 和 b⁽¹⁾ 分别是隐单元的权重和偏差项。即仅在输出层中使用线性激励函数。这用一个 S 型或 tanh 隐含层以及线性输出层构成的自编码器，叫做线性解码器。

在线性解码器中，。因为输出是隐单元激励输出的线性函数，改变 W⁽²⁾ ，即可使输出值 a⁽³⁾ 大于 1 或者小于 0。这样就可以避免在 sigmod 对输出层的值缩放到 [0,1] 。

随着输出单元的激励函数的改变，输出单元的梯度也相应变化。之前每一个输出单元误差项定义为：

其中 y = x 是所期望的输出, 是自编码器的输出, 是激励函数.因为在输出层激励函数为 f(z) = z, 这样 f'(z) = 1，所以上述公式可以简化为

当然，若使用反向传播算法来计算隐含层的误差项时:

因为隐含层采用一个 S 型（或 tanh）的激励函数 f,在上述公式中，依然是 S 型（或 tanh）函数的导数。即Linear Decoder中只有输出层残差是不同于autoencoder 的。

Liner Decoder 代码：

%% CS294A/CS294W Linear Decoder Exercise

%  Instructions

%  ------------

%

%  This file contains code that helps you get started on the

%  linear decoder exericse. For this exercise, you will only need to modify

%  the code in sparseAutoencoderLinearCost.m. You will not need to modify

%  any code in this file.

%%======================================================================

%% STEP : Initialization

%  Here we initialize some parameters used for the exercise.

imageChannels = ;     % number of channels (rgb, so 3)

patchDim   = ;          % patch dimension(需要 8*8 的小patches)

numPatches = ;   % number of patches

% 把8 *  * rgb_size 的小patchs 共同作为可见层的unit数目

visibleSize = patchDim * patchDim * imageChannels;  % number of input units

outputSize  = visibleSize;   % number of output units

hiddenSize  = ;           % number of hidden units

sparsityParam = .; % desired average activation of the hidden units.

lambda = 3e-;         % weight decay parameter

beta = ;              % weight of sparsity penalty term      

epsilon = .;         % epsilon for ZCA whitening

%%======================================================================

%% STEP : Create and modify sparseAutoencoderLinearCost.m to use a linear decoder,

%          and check gradients

%  You should copy sparseAutoencoderCost.m from your earlier exercise

%  and rename it to sparseAutoencoderLinearCost.m.

%  Then you need to rename the function from sparseAutoencoderCost to

%  sparseAutoencoderLinearCost, and modify it so that the sparse autoencoder

%  uses a linear decoder instead. Once that is done, you should check

% your gradients to verify that they are correct.

% NOTE: Modify sparseAutoencoderCost first!

% To speed up gradient checking, we will use a reduced network and some

% dummy patches

debugHiddenSize = ;

debugvisibleSize = ;

patches = rand([ ]);

theta = initializeParameters(debugHiddenSize, debugvisibleSize);

[cost, grad] = sparseAutoencoderLinearCost(theta, debugvisibleSize, debugHiddenSize, ...

                                           lambda, sparsityParam, beta, ...

                                           patches);

% Check gradients

numGrad = computeNumericalGradient( @(x) sparseAutoencoderLinearCost(x, debugvisibleSize, debugHiddenSize, ...

                                                  lambda, sparsityParam, beta, ...

                                                  patches), theta);

% Use this to visually compare the gradients side by side

disp([numGrad grad]);

diff = norm(numGrad-grad)/norm(numGrad+grad);

% Should be small. In our implementation, these values are usually less than 1e-.

disp(diff);

assert(diff < 1e-, 'Difference too large. Check your gradient computation again');

% NOTE: Once your gradients check out, you should run step  again to

%       reinitialize the parameters

%}

%%======================================================================

%% STEP : Learn features on small patches

%  In this step, you will use your sparse autoencoder (which now uses a

%  linear decoder) to learn features on small patches sampled from related

%  images.

%% STEP 2a: Load patches

%  In this step, we load 100k patches sampled from the STL10 dataset and

%  visualize them. Note that these patches have been scaled to [,]

load stlSampledPatches.mat

displayColorNetwork(patches(:, :));

%% STEP 2b: Apply preprocessing

%  In this sub-step, we preprocess the sampled patches, in particular,

%  ZCA whitening them.

%

%  In a later exercise on convolution and pooling, you will need to replicate

%  exactly the preprocessing steps you apply to these patches before

%  using the autoencoder to learn features on them. Hence, we will save the

%  ZCA whitening and mean image matrices together with the learned features

%  later on.

% Subtract mean patch (hence zeroing the mean of the patches)

meanPatch = mean(patches, );

patches = bsxfun(@minus, patches, meanPatch);% - mean

% Apply ZCA whitening

sigma = patches * patches' / numPatches;

[u, s, v] = svd(sigma);

%一下是打算对数据做ZCA变换，数据需要做的变换的矩阵

ZCAWhite = u * diag(1 ./ sqrt(diag(s) + epsilon)) * u';

%这一步是ZCA变换

patches = ZCAWhite * patches;

displayColorNetwork(patches(:, :));

%% STEP 2c: Learn features

%  You will now use your sparse autoencoder (with linear decoder) to learn

%  features on the preprocessed patches. This should take around  minutes.

theta = initializeParameters(hiddenSize, visibleSize);

% Use minFunc to minimize the function

addpath minFunc/

options = struct;

options.Method = 'lbfgs';

options.maxIter = ;

options.display = 'on';

[optTheta, cost] = minFunc( @(p) sparseAutoencoderLinearCost(p, ...

                                   visibleSize, hiddenSize, ...

                                   lambda, sparsityParam, ...

                                   beta, patches), ...

                              theta, options);

% Save the learned features and the preprocessing matrices for use in

% the later exercise on convolution and pooling

fprintf('Saving learned features and preprocessing matrices...\n');

save('STL10Features.mat', 'optTheta', 'ZCAWhite', 'meanPatch');

fprintf('Saved\n');

%% STEP 2d: Visualize learned features

%这里为什么要用(W*ZCAWhite)'呢？首先，使用W*ZCAWhite是因为每个样本x输入网络，

%其输出等价于W*ZCAWhite*x；另外，由于W*ZCAWhite的每一行才是一个隐含节点的变换值

%而displayColorNetwork函数是把每一列显示一个小图像块的，所以需要对其转置。

W = reshape(optTheta(1:visibleSize * hiddenSize), hiddenSize, visibleSize);

b = optTheta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);

displayColorNetwork( (W*ZCAWhite)');

function [cost,grad,features] = sparseAutoencoderLinearCost(theta, visibleSize, hiddenSize, ...

                                                            lambda, sparsityParam, beta, data)

% -------------------- YOUR CODE HERE --------------------

% Instructions:

%   Copy sparseAutoencoderCost in sparseAutoencoderCost.m from your

%   earlier exercise onto this file, renaming the function to

%   sparseAutoencoderLinearCost, and changing the autoencoder to use a

%   linear decoder.

% -------------------- YOUR CODE HERE --------------------    

%将数据由向量转化为矩阵：

W1 = reshape(theta(:hiddenSize*visibleSize), hiddenSize, visibleSize);

W2 = reshape(theta(hiddenSize*visibleSize+:*hiddenSize*visibleSize), visibleSize, hiddenSize);

b1 = theta(*hiddenSize*visibleSize+:*hiddenSize*visibleSize+hiddenSize);

b2 = theta(*hiddenSize*visibleSize+hiddenSize+:end);                              

%样本数

m = size(data ,);

 %%%%%%%%%%% forward %%%%%%%%%%%

z2 = W1*data + repmat(b1, [,m]);

a2 = f(z2);

z3 = W2*a2   + repmat(b2, [,m]);

a3 = z3;

%求当前网络的平均激活度

rho_hat = mean(a2 ,);

rho = sparsityParam;

%对隐层所有节点的散度求和。

KL_Divergence = sum(rho * log(rho ./ rho_hat) + log((- rho) ./ (-rho_hat)));

squares = (a3- data).^;

J_square_err = (/)*(/m)* sum(squares(:));

J_weight_decay = (lambd/)*(sum(W1(:).^) + sum(W2(:).^));

J_sparsity = beta * KL_Divergence;

cost = J_square_err + J_weight_decay + J_sparsity;

%%%%%%%%%%% backward %%%%%%%%%%%

delta3 = -(data-a3);% 注意  linear decoder

beta_term = beta * (- rho ./ rho_hat + (-rho) ./ (-rho_hat));

delta2 = (W2' * delta3) * repmat(beta_term, [1,m]) .* a2 .*(1-a2);

W2grad = (1/m) * delta3 * a2' + lambda * W2;

b2grad = (/m) * sum(delta3, );

W1grad = (/m) * delta2 * data' + lambda * W1;

b1grad = (1/m) * sum(delta2, 2);

%-------------------------------------------------------------------

% Convert weights and bias gradients to a compressed form

% This step will concatenate and flatten all your gradients to a vector

% which can be used in the optimization method.

grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];

end

%-------------------------------------------------------------------

% We are giving you the sigmoid function, you may find this function

% useful in your computation of the loss and the gradients.

function sigm = sigmoid(x)

    sigm = 1 ./ (1 + exp(-x));

end

CS229 6.16 Neurons Networks linear decoders and its implements的更多相关文章

（六）6.16 Neurons Networks linear decoders and its implements
Sparse AutoEncoder是一个三层结构的网络,分别为输入输出与隐层,前边自编码器的描述可知,神经网络中的神经元都采用相同的激励函数,Linear Decoders 修改了自编码器的定义,对 ...
CS229 6.10 Neurons Networks implements of softmax regression
softmax可以看做只有输入和输出的Neurons Networks,如下图: 其参数数量为k*(n+1) ,但在本实现中没有加入截距项,所以参数为k*n的矩阵. 对损失函数J(θ)的形式有: 算法 ...
CS229 6.1 Neurons Networks Representation
面对复杂的非线性可分的样本是,使用浅层分类器如Logistic等需要对样本进行复杂的映射,使得样本在映射后的空间是线性可分的,但在原始空间,分类边界可能是复杂的曲线.比如下图的样本只是在2维情形下的示 ...
CS229 6.17 Neurons Networks convolutional neural network（cnn）
之前所讲的图像处理都是小 patchs ,比如28*28或者36*36之类,考虑如下情形,对于一副1000*1000的图像,即106,当隐层也有106节点时,那么W(1)的数量将达到1012级别,为了 ...
CS229 6.15 Neurons Networks Deep Belief Networks
Hintion老爷子在06年的science上的论文里阐述了 RBMs 可以堆叠起来并且通过逐层贪婪的方式来训练,这种网络被称作Deep Belife Networks(DBN),DBN是一种可以学习 ...
CS229 6.2 Neurons Networks Backpropagation Algorithm
今天得主题是BP算法.大规模的神经网络可以使用batch gradient descent算法求解,也可以使用 stochastic gradient descent 算法,求解的关键问题在于求得每层 ...
CS229 6.14 Neurons Networks Restricted Boltzmann Machines
1.RBM简介受限玻尔兹曼机(Restricted Boltzmann Machines,RBM)最早由hinton提出,是一种无监督学习方法,即对于给定数据,找到最大程度拟合这组数据的参数.RBM ...
CS229 6.13 Neurons Networks Implements of stack autoencoder
对于加深网络层数带来的问题,(gradient diffuse 局部最优等)可以使用逐层预训练(pre-training)的方法来避免 Stack-Autoencoder是一种逐层贪婪(Greedy ...
CS229 6.12 Neurons Networks from self-taught learning to deep network
self-taught learning 在特征提取方面完全是用的无监督的方法,对于有标记的数据,可以结合有监督学习来对上述方法得到的参数进行微调,从而得到一个更加准确的参数a. 在self-taug ...

随机推荐

ML（附录2）——最小二乘法
参见 :多变量微积分笔记2——最小二乘法
在linux环境下，php语法出错，怎样让php编译后提示编译错误，错误在哪？
如果不具备修改php.ini的权限,可以如下:ini_set("display_errors", "On"); error_reporting(E_ALL | ...
LOJ 2737 「JOISC 2016 Day 3」电报 ——思路+基环树DP
题目:https://loj.ac/problem/2737 相连的关系形成若干环 / 内向基环树 .如果不是只有一个环的话,就得断开一些边使得图变成若干链.边的边权是以它为出边的点的点权. 基环树的 ...
supervisor的安装部署及集群管理
supervisor的安装部署及集群管理 supervisor官网:http://www.supervisord.org/ 参考链接: http://blog.csdn.net/xyang81/art ...
swoole创建工作进程，执行滞后工作
一,创建守候进程,因为这里不需要Server,也没有Client,数据交换通过redis进行 <?php namespace Kuba\Saas; require_once __DIR__ . ...
TdxMemData 的Bug和使用
aa.CopyFromDataSet(acdsBase);//克隆一个,与LoadFromDataSet区别,如果设置了Field,那么L只会导入设置的部分,而C则是全部复制过来 TdxMemData ...
Composer的学习
来自http://blog.sina.com.cn/s/blog_6262a50e0101b5ut.html 简介 composer是PHP中的一个依赖关系管理工具.只要(按指定格式)声明项目所依赖的 ...
配置 influxDB 鉴权及 HTTP API 写数据的方法
本文简要描述如何为 InfluxDB 开启鉴权和配置用户管理权限(安装后默认不需要登录),以及开启鉴权后如何使用 HTTP API 写数据. 创建 InfluxDB 管理员账号创建 admin 帐号密 ...
黄聪：PHP数据库连接失败--could not find driver 解决办法
数据库连接失败could not find driver在调试一个PHP程序时,报了这个错误, could not find driver 经过一番查找,结合自己的思考和实践,终于找到了问题所在. 原 ...
sublime 最近用的有点卡
index_files:false,

CS229 6.16 Neurons Networks linear decoders and its implements

CS229 6.16 Neurons Networks linear decoders and its implements的更多相关文章

随机推荐

热门专题