（六）6.5 Neurons Networks Implements of Sparse Autoencoder

一大波matlab代码正在靠近.- -!

sparse autoencoder的一个实例练习，这个例子所要实现的内容大概如下：从给定的很多张自然图片中截取出大小为8*8的小patches图片共10000张，现在需要用sparse autoencoder的方法训练出一个隐含层网络所学习到的特征。该网络共有3层，输入层是64个节点，隐含层是25个节点，输出层当然也是64个节点了。

main函数, 分五步走，每个函数的实现细节在下边都列出了。

%%======================================================================

%% STEP 0: Here we provide the relevant parameters values that will

%  allow your sparse autoencoder to get good filters; you do not need to

%  change the parameters below.

visibleSize = 8*8;   % number of input units

hiddenSize = 25;     % number of hidden units

sparsityParam = 0.01;   % desired average activation of the hidden units.

                     % (This was denoted by the Greek alphabet rho,

				   	 % which looks like a lower-case "p",

		     %  in the lecture notes).

lambda = 0.0001;     % weight decay parameter

beta = 3;            % weight of sparsity penalty term       

%%======================================================================

%% STEP 1: Implement sampleIMAGES

%

%  After implementing sampleIMAGES, the display_network command should

%  display a random sample of 200 patches from the dataset

patches = sampleIMAGES;

display_network(patches(:,randi(size(patches,2),200,1)),8);

%  Obtain random parameters theta

theta = initializeParameters(hiddenSize, visibleSize);

%%======================================================================

%% STEP 2: Implement sparseAutoencoderCost

%

%  You can implement all of the components (squared error cost, weight decay term,

%  sparsity penalty) in the cost function at once, but it may be easier to do

%  it step-by-step and run gradient checking (see STEP 3) after each step.  We

%  suggest implementing the sparseAutoencoderCost function using the following steps:

%

%  (a) Implement forward propagation in your neural network, and implement the

%      squared error term of the cost function.  Implement backpropagation to

%      compute the derivatives.   Then (using lambda=beta=0), run Gradient Checking

%      to verify that the calculations corresponding to the squared error cost

%      term are correct.

%

%  (b) Add in the weight decay term (in both the cost function and the derivative

%      calculations), then re-run Gradient Checking to verify correctness.

%

%  (c) Add in the sparsity penalty term, then re-run Gradient Checking to

%      verify correctness.

%

%  Feel free to change the training settings when debugging your

%  code.  (For example, reducing the training set size or

%  number of hidden units may make your code run faster; and setting beta

%  and/or lambda to zero may be helpful for debugging.)  However, in your

%  final submission of the visualized weights, please use parameters we

%  gave in Step 0 above.

[cost, grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...

									lambda,sparsityParam, beta, patches);

%%======================================================================

%% STEP 3: Gradient Checking

%

% Hint: If you are debugging your code, performing gradient checking on smaller models

% and smaller training sets (e.g., using only 10 training examples and 1-2 hidden

% units) may speed things up.

% First, lets make sure your numerical gradient computation is correct for a

% simple function.  After you have implemented computeNumericalGradient.m,

% run the following:

checkNumericalGradient();

% Now we can use it to check your cost function and derivative calculations

% for the sparse autoencoder.

numgrad = computeNumericalGradient( @(x) sparseAutoencoderCost(x, visibleSize, ...

						hiddenSize, lambda,sparsityParam, beta, patches), theta);

% Use this to visually compare the gradients side by side

disp([numgrad grad]); 

% Compare numerically computed gradients with the ones obtained from backpropagation

diff = norm(numgrad-grad)/norm(numgrad+grad);

disp(diff); % Should be small. In our implementation, these values are

            % usually less than 1e-9.

            % When you got this working, Congratulations!!! 

%%======================================================================

%% STEP 4: After verifying that your implementation of

%  sparseAutoencoderCost is correct, You can start training your sparse

%  autoencoder with minFunc (L-BFGS).

%  Randomly initialize the parameters

theta = initializeParameters(hiddenSize, visibleSize);

%  Use minFunc to minimize the function

addpath minFunc/

options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost

                          % function. Generally, for minFunc to work, you

                          % need a function pointer with two outputs: the

                          % function value and the gradient. In our problem,

                          % sparseAutoencoderCost.m satisfies this.

options.maxIter = 400;	  % Maximum number of iterations of L-BFGS to run

options.display = 'on';

[opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p,visibleSize, hiddenSize, ...

							lambda, sparsityParam, beta, patches),theta, options);

%%======================================================================

%% STEP 5: Visualization 

W1 = reshape(opttheta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);

display_network(W1', 12); 

print -djpeg weights.jpg   % save the visualization to a file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 对应step1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%三个函数（sampleIMAGES）（normalizeData）（initializeParameters）%%%%

function patches = sampleIMAGES()

load IMAGES;    % 加载初始的10张512*512大图片 

patchsize = 8;  % 采样大小

numpatches = 10000;

%  初始化该矩阵为0，该矩阵为 64*10000维每一列为一张图片.

patches = zeros(patchsize*patchsize, numpatches);

%  IMAGES 为一个包含10 张images的三维数组，IMAGES(:,:,6) 是一个第六张图片的 512x512 的二维数组,

%  命令 "imagesc(IMAGES(:,:,6)), colormap gray;" 可以把第六张图可视化.

% 这几张图是经过whiteing预处理的？

%  IMAGES(21:30,21:30,1) 就是从第一张图采样得到的(21,21) to (30,30) 的小patchs

%在每张图片中随机选取1000个patch，共10000个patch

for imageNum = 1:10

    [rowNum colNum] = size(IMAGES(:,:,imageNum));

	%实现每张图片选取1000个patch

    for patchNum = 1:1000

		%得到左上角的两个点

        xPos = randi([1,rowNum-patchsize+1]);

        yPos = randi([1, colNum-patchsize+1]);

		%填充到矩阵里

        patches(:,(imageNum-1)*1000+patchNum) = ...

			reshape(IMAGES(xPos:xPos+7,yPos:yPos+7,imageNum),64,1);

    end

end

%由于autoencoder的激励函数是sigmod函数，输出值限定在[0,1],故为了达到H W,b（x）= x，x作为输入，

%也要限定在0-1之间，故需要进行正则化

patches = normalizeData(patches);

end

% 正则化的函数，不太明白s-sigma法则？

function patches = normalizeData(patches)

% 减去均值

patches = bsxfun(@minus, patches, mean(patches));

% s = std(X)，此处X是一个矢量，该函数返回标准偏差（注意其分母为n-1，而不是n） 。

% 结果s是一个X各样本偏差无偏估计的平方根(X包含独立的、同分布样本)。

% 如果X是一个矩阵，该函数返回一个行矢量，它包含了X每列元素的标准偏差。

pstd = 3 * std(patches(:));

patches = max(min(patches, pstd), -pstd) / pstd;

% 重新压缩 从[-1,1] 到 [0.1,0.9]

patches = (patches + 1) * 0.4 + 0.1;

end

%首先初始化参数

function theta = initializeParameters(hiddenSize, visibleSize)

% Initialize parameters randomly based on layer sizes.

 % we'll choose weights uniformly from the interval [-r, r]

r  = sqrt(6) / sqrt(hiddenSize+visibleSize+1);

%rand(a,b)产生均匀分布的随机矩阵维度为a*b，元素取值范围0.0 ～1.0。

W1 = rand(hiddenSize, visibleSize) * 2 * r - r;

%rand(a,b)*2*r即取值范围为（0-2r）， rand(a,b)*2*r -r即取值范围为（-r - r）

W2 = rand(visibleSize, hiddenSize) * 2 * r - r;

b1 = zeros(hiddenSize, 1); %连接到hidden unit的偏置单元

b2 = zeros(visibleSize, 1); %链接到output layer的偏置单元

%  将矩阵合并为一个向量

theta = [W1(:) ; W2(:) ; b1(:) ; b2(:)];

%初始化参数结束

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 对应step 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%返回稀疏损失函数的值与梯度值%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...

										lambda, sparsityParam, beta, data)

% visibleSize: 输入层单元数

% hiddenSize: 隐藏单元数

% lambda: 正则项

% sparsityParam: （p）指定的平均激活度p

% beta: 稀疏权重项B

% data: 64x10000 的矩阵为training data,data(:,i)  是第i个训练样例.

% 把参数拼接为一个向量，因为采用L-BFGS优化，L-BFGS要求的就是向量.

% 将长向量转换成每一层的权值矩阵和偏置向量值

% theta向量的的 1->hiddenSize*visibleSize，W1共hiddenSize*visibleSize 个元素，重新作为矩阵

W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);

%类似以上一直往后放

W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);

b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);

b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);

% 参数对应的梯度矩阵 ;

cost = 0;

W1grad = zeros(size(W1));

W2grad = zeros(size(W2));

b1grad = zeros(size(b1));

b2grad = zeros(size(b2));

Jcost = 0;  %直接误差

Jweight = 0;%权值惩罚

Jsparse = 0;%稀疏性惩罚

[n m] = size(data); %m为样本的个数，n为样本的特征数

%前向算法计算各神经网络节点的线性组合值和active值

%W1为 hiddenSize*visibleSize的矩阵

%data为 visibleSize* trainexampleNum的矩阵

%remat(b1,1,m)把向量b1复制扩展为hiddenSize*m列

% 根据公式 Z^(l) = z^(l-1)*W^(l-1)+b^(l-1)

%z2保存的是10000个样本下隐藏层的输入，为hiddenSize*m维的矩阵，每一列代表一次输入

z2= W1*data + remat(b1,1,m)；%第二层的输入

a2 = sigmoid(z2); %对z2取sigmod 即得到a2，即隐藏层的输出

z3 = W2*a2+repmat(b2,1,m); %output layer 的输入

a3 = sigmoid(z3); %output 层的输出

% 计算预测产生的误差

%对应J(W,b), 外边的sum是对所有样本求和，里边的sum是对输出层的所有分量求和

Jcost = (0.5/m)*sum(sum((a3-data).^2));

%计算权值惩罚项 正则化项，并没有带正则项参数

Jweight = (1/2)*(sum(sum(W1.^2))+sum(sum(W2.^2)));

%计算稀疏性规则项 sum(matrix,2)是进行按行求和运算，即所有样本在隐层的输出累加求均值

% rho为一个hiddenSize*1 维的向量

rho = (1/m).*sum(a2,2);%求出隐含层输出aj的平均值向量 rho为hiddenSize维的

%求稀疏项的损失

Jsparse = sum(sparsityParam.*log(sparsityParam./rho)+(1-sparsityParam).*log((1-sparsityParam)./(1-rho)));

%损失函数的总表达式 损失项 + 正则化项 + 稀疏项

cost = Jcost + lambda*Jweight + beta*Jsparse;

%计算l = 3 即 output-layer层的误差dleta3，因为在autoencoder中输入等于输出h(W,b)=x

delta3 = -(data-a3).*sigmoidInv(z3);

%因为加入了稀疏规则项，所以计算偏导时需要引入该项，sterm为稀疏项，为hiddenSize维的向量

sterm = beta*(-sparsityParam./rho+(1-sparsityParam)./(1-rho))

% W2 为64*25的矩阵，d3为第三层的输出为64*10000的矩阵，每一列为每个样本x^(i)的输出，W2'为W2的转置

% repmat(sterm,1,m)会把函数复制扩展为m列的矩阵，每一列都为sterm向量。

% d2为hiddenSize*10000的矩阵

delta2 = (W2'*delta3+repmat(sterm,1,m)).*sigmoidInv(z2); 

%计算W1grad

% data'为10000*64的矩阵 d2*data' 位25*64的矩阵

W1grad = W1grad+delta2*data';

W1grad = (1/m)*W1grad+lambda*W1;

%计算W2grad

W2grad = W2grad+delta3*a2';

W2grad = (1/m).*W2grad+lambda*W2;

%计算b1grad

b1grad = b1grad+sum(delta2,2);

b1grad = (1/m)*b1grad;%注意b的偏导是一个向量，所以这里应该把每一行的值累加起来

%计算b2grad

b2grad = b2grad+sum(delta3,2);

b2grad = (1/m)*b2grad;

%计算完成重新转为向量

grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];

end

%-------------------------------------------------------------------

% Here's an implementation of the sigmoid function, which you may find useful

% in your computation of the costs and the gradients.  This inputs a (row or

% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). 

function sigm = sigmoid(x)

    sigm = 1 ./ (1 + exp(-x));

end

%sigmoid函数的导函数

function sigmInv = sigmoidInv(x)

    sigmInv = sigmoid(x).*(1-sigmoid(x));

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 对应step 3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%

%三个函数：（checkNumericalGradient）（simpleQuadraticFunction）（computeNumericalGradient）

function [] = checkNumericalGradient()

x = [4; 10];

%当前简单函数实际的值与实际的导函数

[value, grad] = simpleQuadraticFunction(x);

% 在点 x 处计算简单函数的梯度，("@simpleQuadraticFunction" denotes a pointer to a function.)

numgrad = computeNumericalGradient(@simpleQuadraticFunction, x);

% disp()等价于 print()

disp([numgrad grad]);

fprintf('The above two columns you get should be very similar.\n(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n');

% norm 等价于 sqrt(sum(X.^2)); 如果实现正确，设置 EPSILON = 0.0001，误差应该为2.1452e-12

diff = norm(numgrad-grad)/norm(numgrad+grad);

disp(diff);

fprintf('Norm of the difference between numerical and analytical gradient (should be < 1e-9)\n\n');

end

 %这个简单函数用来检验写的computeNumericalGradient函数的正确性

function [value,grad] = simpleQuadraticFunction(x)

% this function accepts a 2D vector as input.

% Its outputs are:

%   value: h(x1, x2) = x1^2 + 3*x1*x2

%   grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2

% Note that when we pass @simpleQuadraticFunction(x) to computeNumericalGradients, we're assuming

% that computeNumericalGradients will use only the first returned value of this function.

value = x(1)^2 + 3*x(1)*x(2);

grad = zeros(2, 1);

grad(1)  = 2*x(1) + 3*x(2);

grad(2)  = 3*x(1);

end

%梯度检验的函数

function numgrad = computeNumericalGradient(J, theta)

% theta: 参数，向量或者实数均可

% J: 输出值为实数的函数. 调用y = J(theta)将会返回函数在theta处的值

% numgrad初始化为0,与theta维度相同

numgrad = zeros(size(theta));

EPSILON = 1e-4;

% theta是一个行向量，size(theta,1)是求行数

n = size(theta,1);

%产生一个维度为n的单位矩阵

E = eye(n);

for i = 1:n

	% (n,:)代表第n行，所有的列

	% (:,n)代表所有行，第n列

	% 由于E是单位矩阵，所以只有第i行第i列的元素变为EPSILON

    delta = E(:,i)*EPSILON;

	%向量第i维度的值

    numgrad(i) = (J(theta+delta)-J(theta-delta))/(EPSILON*2.0);

end

%% ---------------------------------------------------------------

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 对应step 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%关于函数的展示%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor)

% This function visualizes filters in matrix A. Each column of A is a

% filter. We will reshape each column into a square image and visualizes

% on each cell of the visualization panel.

% All other parameters are optional, usually you do not need to worry

% about it.

% opt_normalize: whether we need to normalize the filter so that all of

% them can have similar contrast. Default value is true.

% opt_graycolor: whether we use gray as the heat map. Default is true.

% cols: how many columns are there in the display. Default value is the

% squareroot of the number of columns in A.

% opt_colmajor: you can switch convention to row major for A. In that

% case, each row of A is a filter. Default value is false.

warning off all

if ~exist('opt_normalize', 'var') || isempty(opt_normalize)

    opt_normalize= true;

end

if ~exist('opt_graycolor', 'var') || isempty(opt_graycolor)

    opt_graycolor= true;

end

if ~exist('opt_colmajor', 'var') || isempty(opt_colmajor)

    opt_colmajor = false;

end

% rescale

A = A - mean(A(:));

if opt_graycolor, colormap(gray); end

% compute rows, cols

[L M]=size(A);

sz=sqrt(L);

buf=1;

if ~exist('cols', 'var')

    if floor(sqrt(M))^2 ~= M

        n=ceil(sqrt(M));

        while mod(M, n)~=0 && n<1.2*sqrt(M), n=n+1; end

        m=ceil(M/n);

    else

        n=sqrt(M);

        m=n;

    end

else

    n = cols;

    m = ceil(M/n);

end

array=-ones(buf+m*(sz+buf),buf+n*(sz+buf));

if ~opt_graycolor

    array = 0.1.* array;

end

if ~opt_colmajor

    k=1;

    for i=1:m

        for j=1:n

            if k>M,

                continue;

            end

            clim=max(abs(A(:,k)));

            if opt_normalize

                array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/clim;

            else

                array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/max(abs(A(:)));

            end

            k=k+1;

        end

    end

else

    k=1;

    for j=1:n

        for i=1:m

            if k>M,

                continue;

            end

            clim=max(abs(A(:,k)));

            if opt_normalize

                array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz)/clim;

            else

                array(buf+(i-1)*(sz+buf)+(1:sz),buf+(j-1)*(sz+buf)+(1:sz))=reshape(A(:,k),sz,sz);

            end

            k=k+1;

        end

    end

end

if opt_graycolor

    h=imagesc(array,'EraseMode','none',[-1 1]);

else

    h=imagesc(array,'EraseMode','none',[-1 1]);

end

axis image off

drawnow;

warning on all

（六）6.5 Neurons Networks Implements of Sparse Autoencoder的更多相关文章

CS229 6.5 Neurons Networks Implements of Sparse Autoencoder
sparse autoencoder的一个实例练习,这个例子所要实现的内容大概如下:从给定的很多张自然图片中截取出大小为8*8的小patches图片共10000张,现在需要用sparse autoen ...
（六）6.13 Neurons Networks Implements of stack autoencoder
对于加深网络层数带来的问题,(gradient diffuse 局部最优等)可以使用逐层预训练(pre-training)的方法来避免 Stack-Autoencoder是一种逐层贪婪(Greedy ...
CS229 6.13 Neurons Networks Implements of stack autoencoder
对于加深网络层数带来的问题,(gradient diffuse 局部最优等)可以使用逐层预训练(pre-training)的方法来避免 Stack-Autoencoder是一种逐层贪婪(Greedy ...
（六）6.10 Neurons Networks implements of softmax regression
softmax可以看做只有输入和输出的Neurons Networks,如下图: 其参数数量为k*(n+1) ,但在本实现中没有加入截距项,所以参数为k*n的矩阵. 对损失函数J(θ)的形式有: 算法 ...
（六）6.11 Neurons Networks implements of self-taught learning
在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...
CS229 6.10 Neurons Networks implements of softmax regression
softmax可以看做只有输入和输出的Neurons Networks,如下图: 其参数数量为k*(n+1) ,但在本实现中没有加入截距项,所以参数为k*n的矩阵. 对损失函数J(θ)的形式有: 算法 ...
(六) 6.1 Neurons Networks Representation
面对复杂的非线性可分的样本是,使用浅层分类器如Logistic等需要对样本进行复杂的映射,使得样本在映射后的空间是线性可分的,但在原始空间,分类边界可能是复杂的曲线.比如下图的样本只是在2维情形下的示 ...
CS229 6.11 Neurons Networks implements of self-taught learning
在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...
（六）6.8 Neurons Networks implements of PCA ZCA and whitening
PCA 给定一组二维数据,每列十一组样本,共45个样本点 -6.7644914e-01 -6.3089308e-01 -4.8915202e-01 ... -4.4722050e-01 -7.4 ...

随机推荐

Android 源码判断网络数据类型
private final void updateDataNetType(int slotId) { int tempDataNetType; NetworkType tempDataNetType3 ...
堆排序 Heap Sort
堆排序虽然叫heap sort,但是和内存上的那个heap并没有实际关系.算法上,堆排序一般使用数组的形式来实现,即binary heap. 我们可以将堆排序所使用的堆int[] heap视为一个完全 ...
sessionID的本质
SessionID的本质一.客户端用cookie保存了sessionID 客户端用cookie保存了sessionID,当我们请求服务器的时候,会把这个sessionID一起发给服务器,服务器会到内 ...
redhat6修改主机名
1.临时修改主机名 sudo hostname lyhost 2.永久修改主机名 vim /etc/sysconfig/network 修改里面的hostname字段即可,重启后生效.
fuel openstack 在 VirtualBox 上的部署
搞了两天,终于搞好了.说说过程: 一开始用的6.1版本的Fuel.iso,按照国内外的教程装了几遍,但是网路验证始终不能通过.自己又不是很懂网络.后来在网上看到说6.1版的需要fuel-master连 ...
Project Euler 108：Diophantine reciprocals I 丢番图倒数I
Diophantine reciprocals I In the following equation x, y, and n are positive integers. For n = 4 the ...
8 simple things that will make you sexy
8 simple things that will make you sexy8种方法教你不动声色的性感What makes a women sexy? Is it her body? Is it t ...
JavaWeb项目开发案例精粹-第4章博客网站系统-003Dao层
1. package com.sanqing.dao; import java.util.List; import com.sanqing.fenye.Page; import com.sanqing ...
React-非dom属性-key
一.react性能 1.内容类似的尽量归到同一人组件,这样react不用每次都重新渲染 2.类似列表的内容,要加上key,可减少渲染次数 3.react渲染过程二.代码 <!DOCTYPE h ...
在编辑框(CEdit)控件上追加内容
版权声明:本文为博主原创文章,未经博主允许不得转载. // MFC 版本 CEdit* pEdit = GetDlgItem(IDC_EDIT1); ASSERT(pEdit && p ...

（六）6.5 Neurons Networks Implements of Sparse Autoencoder

（六）6.5 Neurons Networks Implements of Sparse Autoencoder的更多相关文章

随机推荐

热门专题