前面记到了深度网络这一章。当时觉得练习应该挺简单的,用不了多少时间,结果训练时间真够长的...途中debug的时候还手贱的clear了一下,又得从头开始运行。不过最终还是调试成功了,sigh~

前一篇博文讲了深度网络的一些基本知识,这次讲义中的练习还是针对MNIST手写库,主要步骤是训练两个自编码器,然后进行softmax回归,最后再整体进行一次微调。

训练自编码器以及softmax回归都是利用前面已经写好的代码。微调部分的代码其实就是一次反向传播。

以下就是代码:

主程序部分:

stackedAEExercise.m

%  For the purpose of completing the assignment, you do not need to
% change the code in this file.
%
%%======================================================================
%% STEP 0: Here we provide the relevant parameters values that will
% allow your sparse autoencoder to get good filters; you do not need to
% change the parameters below.
DISPLAY = true;
inputSize = 28 * 28;
numClasses = 10;
hiddenSizeL1 = 200; % Layer 1 Hidden Size
hiddenSizeL2 = 200; % Layer 2 Hidden Size
sparsityParam = 0.1; % desired average activation of the hidden units.
% (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
% in the lecture notes).
lambda = 3e-3; % weight decay parameter
beta = 3; % weight of sparsity penalty term %%======================================================================
%% STEP 1: Load data from the MNIST database
%
% This loads our training data from the MNIST database files. % Load MNIST database files
trainData = loadMNISTImages('mnist/train-images-idx3-ubyte');
trainLabels = loadMNISTLabels('mnist/train-labels-idx1-ubyte'); trainLabels(trainLabels == 0) = 10; % Remap 0 to 10 since our labels need to start from 1 %%======================================================================
%% STEP 2: Train the first sparse autoencoder
% This trains the first sparse autoencoder on the unlabelled STL training
% images.
% If you've correctly implemented sparseAutoencoderCost.m, you don't need
% to change anything here. % Randomly initialize the parameters
sae1Theta = initializeParameters(hiddenSizeL1, inputSize); %% ---------------------- YOUR CODE HERE ---------------------------------
% Instructions: Train the first layer sparse autoencoder, this layer has
% an hidden size of "hiddenSizeL1"
% You should store the optimal parameters in sae1OptTheta % Use minFunc to minimize the function
addpath minFunc/
options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost
% function. Generally, for minFunc to work, you
% need a function pointer with two outputs: the
% function value and the gradient. In our problem,
% sparseAutoencoderCost.m satisfies this.
options.maxIter = 400; % Maximum number of iterations of L-BFGS to run
options.display = 'on'; [sae1optTheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
inputSize, hiddenSizeL1, ...
lambda, sparsityParam, ...
beta, trainData), ...
sae1Theta, options); %------------------------------------------------------------------------- %======================================================================
% STEP 2: Train the second sparse autoencoder %This trains the second sparse autoencoder on the first autoencoder
%featurse.
%If you've correctly implemented sparseAutoencoderCost.m, you don't need
%to change anything here. [sae1Features] = feedForwardAutoencoder(sae1optTheta, hiddenSizeL1, ...
inputSize, trainData); % Randomly initialize the parameters
sae2Theta = initializeParameters(hiddenSizeL2, hiddenSizeL1); %% ---------------------- YOUR CODE HERE ---------------------------------
% Instructions: Train the second layer sparse autoencoder, this layer has
% an hidden size of "hiddenSizeL2" and an inputsize of
% "hiddenSizeL1"
%
% You should store the optimal parameters in sae2OptTheta [sae2opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...
hiddenSizeL1, hiddenSizeL2, ...
lambda, sparsityParam, ...
beta, sae1Features), ...
sae2Theta, options); %------------------------------------------------------------------------- %======================================================================
%% STEP 3: Train the softmax classifier
% This trains the sparse autoencoder on the second autoencoder features.
% If you've correctly implemented softmaxCost.m, you don't need
% to change anything here. [sae2Features] = feedForwardAutoencoder(sae2opttheta, hiddenSizeL2, ...
hiddenSizeL1, sae1Features); % Randomly initialize the parameters
saeSoftmaxTheta = 0.005 * randn(hiddenSizeL2 * numClasses, 1); %% ---------------------- YOUR CODE HERE ---------------------------------
% Instructions: Train the softmax classifier, the classifier takes in
% input of dimension "hiddenSizeL2" corresponding to the
% hidden layer size of the 2nd layer.
%
% You should store the optimal parameters in saeSoftmaxOptTheta
%
% NOTE: If you used softmaxTrain to complete this part of the exercise,
% set saeSoftmaxOptTheta = softmaxModel.optTheta(:); options.maxIter = 100;
softmax_lambda = 1e-4; numLabels = 10;
softmaxModel = softmaxTrain(hiddenSizeL2, numLabels, softmax_lambda, ...
sae2Features, trainLabels, options);
saeSoftmaxOptTheta = softmaxModel.optTheta(:); %------------------------------------------------------------------------- %======================================================================
%% STEP 5: Finetune softmax model % Implement the stackedAECost to give the combined cost of the whole model
% then run this cell. % Initialize the stack using the parameters learned
inputSize = 28*28;
stack = cell(2,1);
stack{1}.w = reshape(sae1optTheta(1:hiddenSizeL1*inputSize), ...
hiddenSizeL1, inputSize);
stack{1}.b = sae1optTheta(2*hiddenSizeL1*inputSize+1:2*hiddenSizeL1*inputSize+hiddenSizeL1);
stack{2}.w = reshape(sae2opttheta(1:hiddenSizeL2*hiddenSizeL1), ...
hiddenSizeL2, hiddenSizeL1);
stack{2}.b = sae2opttheta(2*hiddenSizeL2*hiddenSizeL1+1:2*hiddenSizeL2*hiddenSizeL1+hiddenSizeL2); % Initialize the parameters for the deep model
[stackparams, netconfig] = stack2params(stack);
stackedAETheta = [ saeSoftmaxOptTheta ; stackparams ]; %% ---------------------- YOUR CODE HERE ---------------------------------
% Instructions: Train the deep network, hidden size here refers to the '
% dimension of the input to the classifier, which corresponds
% to "hiddenSizeL2".
%
%
[stackedAEOptTheta, cost] = minFunc( @(p) stackedAECost(p, inputSize, hiddenSizeL2, ...
numClasses, netconfig, ...
lambda, trainData, trainLabels), ...
stackedAETheta,options); % ------------------------------------------------------------------------- %%======================================================================
%% STEP 6: Test
% Instructions: You will need to complete the code in stackedAEPredict.m
% before running this part of the code
% % Get labelled test images
% Note that we apply the same kind of preprocessing as the training set
testData = loadMNISTImages('mnist/t10k-images-idx3-ubyte');
testLabels = loadMNISTLabels('mnist/t10k-labels-idx1-ubyte'); testLabels(testLabels == 0) = 10; % Remap 0 to 10 [pred] = stackedAEPredict(stackedAETheta, inputSize, hiddenSizeL2, ...
numClasses, netconfig, testData); acc = mean(testLabels(:) == pred(:));
fprintf('Before Finetuning Test Accuracy: %0.3f%%\n', acc * 100); [pred] = stackedAEPredict(stackedAEOptTheta, inputSize, hiddenSizeL2, ...
numClasses, netconfig, testData); acc = mean(testLabels(:) == pred(:));
fprintf('After Finetuning Test Accuracy: %0.3f%%\n', acc * 100); % Accuracy is the proportion of correctly classified images
% The results for our implementation were:
%
% Before Finetuning Test Accuracy: 87.7%
% After Finetuning Test Accuracy: 97.6%
%
% If your values are too low (accuracy less than 95%), you should check
% your code for errors, and make sure you are training on the
% entire data set of 60000 28x28 training images
% (unless you modified the loading code, this should be the case)

 微调部分的代价函数:

stackedAECost.m

function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ...
numClasses, netconfig, ...
lambda, data, labels) % stackedAECost: Takes a trained softmaxTheta and a training data set with labels,
% and returns cost and gradient using a stacked autoencoder model. Used for
% finetuning. % theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize: the number of hidden units *at the 2nd layer*
% numClasses: the number of categories
% netconfig: the network configuration of the stack
% lambda: the weight regularization penalty
% data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example.
% labels: A vector containing labels, where labels(i) is the label for the
% i-th training example %% Unroll softmaxTheta parameter % We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize); % Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig); % You will need to compute the following gradients
softmaxThetaGrad = zeros(size(softmaxTheta));
stackgrad = cell(size(stack));
for d = 1:numel(stack)
stackgrad{d}.w = zeros(size(stack{d}.w));
stackgrad{d}.b = zeros(size(stack{d}.b));
end cost = 0; % You need to compute this % You might find these variables useful
M = size(data, 2);
groundTruth = full(sparse(labels, 1:M, 1)); %% --------------------------- YOUR CODE HERE -----------------------------
% Instructions: Compute the cost function and gradient vector for
% the stacked autoencoder.
%
% You are given a stack variable which is a cell-array of
% the weights and biases for every layer. In particular, you
% can refer to the weights of Layer d, using stack{d}.w and
% the biases using stack{d}.b . To get the total number of
% layers, you can use numel(stack).
%
% The last layer of the network is connected to the softmax
% classification layer, softmaxTheta.
%
% You should compute the gradients for the softmaxTheta,
% storing that in softmaxThetaGrad. Similarly, you should
% compute the gradients for each layer in the stack, storing
% the gradients in stackgrad{d}.w and stackgrad{d}.b
% Note that the size of the matrices in stackgrad should
% match exactly that of the size of the matrices in stack.
%
%----------先计算a和z----------------
d = numel(stack); %stack的深度
n = d+1; %网络层数
a = cell(n,1);
z = cell(n,1);
a{1} = data; %a{1}设成输入数据
for l = 2:n %给a{2,...n}和z{2,,...n}赋值
z{l} = stack{l-1}.w * a{l-1} + repmat(stack{l-1}.b,[1,size(a{l-1},2)]);
a{l} = sigmoid(z{l});
end
%------------------------------------ %-------------计算softmax的代价函数和梯度函数-------------
Ma = softmaxTheta * a{n};
NorM = bsxfun(@minus, Ma, max(Ma, [], 1)); %归一化,每列减去此列的最大值,使得M的每个元素不至于太大。
ExpM = exp(NorM);
P = bsxfun(@rdivide,ExpM,sum(ExpM)); %概率
cost = -1/M*(groundTruth(:)'*log(P(:)))+lambda/2*(softmaxTheta(:)'*softmaxTheta(:)); %代价函数
softmaxThetaGrad = -1/M*((groundTruth-P)*a{n}') + lambda*softmaxTheta; %梯度
%-------------------------------------------------------- %--------------计算每一层的delta---------------------
delta = cell(n);
delta{n} = -softmaxTheta'*(groundTruth-P).*(a{n}).*(1-a{n}); %可以参照前面讲义BP算法的实现
for l = n-1:-1:1
delta{l} = stack{l}.w' * delta{l+1}.*(a{l}).*(1-a{l});
end
%---------------------------------------------------- %--------------计算每一层的w和b的梯度-----------------
for l = n-1:-1:1
stackgrad{l}.w = (1/M)*delta{l+1}*a{l}';
stackgrad{l}.b = (1/M)*sum(delta{l+1},2);
end
%---------------------------------------------------- % ------------------------------------------------------------------------- %% Roll gradient vector
grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)]; end % You might find this useful
function sigm = sigmoid(x)
sigm = 1 ./ (1 + exp(-x));
end

预测函数:

stackedAEPredict.m

function [pred] = stackedAEPredict(theta, inputSize, hiddenSize, numClasses, netconfig, data)

% stackedAEPredict: Takes a trained theta and a test data set,
% and returns the predicted labels for each example. % theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize: the number of hidden units *at the 2nd layer*
% numClasses: the number of categories
% data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. % Your code should produce the prediction matrix
% pred, where pred(i) is argmax_c P(y(c) | x(i)). %% Unroll theta parameter % We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize); % Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig); %% ---------- YOUR CODE HERE --------------------------------------
% Instructions: Compute pred using theta assuming that the labels start
% from 1.
%
%----------先计算a和z----------------
d = numel(stack); %stack的深度
n = d+1; %网络层数
a = cell(n,1);
z = cell(n,1);
a{1} = data; %a{1}设成输入数据
for l = 2:n %给a{2,...n}和z{2,,...n}赋值
z{l} = stack{l-1}.w * a{l-1} + repmat(stack{l-1}.b,[1,size(a{l-1},2)]);
a{l} = sigmoid(z{l});
end
%-------------------------------------
M = softmaxTheta * a{n};
[Y,pred] = max(M,[],1); % ----------------------------------------------------------- end % You might find this useful
function sigm = sigmoid(x)
sigm = 1 ./ (1 + exp(-x));
end

最后结果:

跟讲义以及程序注释中有点差别,特别是没有微调的结果,讲义中提到是不到百分之九十的,这里算出来是百分之九十四左右:

但是微调后的结果基本是一样的。

PS:讲义地址:http://deeplearning.stanford.edu/wiki/index.php/Exercise:_Implement_deep_networks_for_digit_classification

Deep Learning 学习随记(五)深度网络--续的更多相关文章

  1. Deep Learning 学习随记(三)续 Softmax regression练习

    上一篇讲的Softmax regression,当时时间不够,没把练习做完.这几天学车有点累,又特别想动动手自己写写matlab代码 所以等到了现在,这篇文章就当做上一篇的续吧. 回顾: 上一篇最后给 ...

  2. Deep Learning 学习随记(五)Deep network 深度网络

    这一个多周忙别的事去了,忙完了,接着看讲义~ 这章讲的是深度网络(Deep Network).前面讲了自学习网络,通过稀疏自编码和一个logistic回归或者softmax回归连接,显然是3层的.而这 ...

  3. 深度学习笔记之关于总结、展望、参考文献和Deep Learning学习资源(五)

    不多说,直接上干货! 十.总结与展望 1)Deep learning总结 深度学习是关于自动学习要建模的数据的潜在(隐含)分布的多层(复杂)表达的算法.换句话来说,深度学习算法自动的提取分类需要的低层 ...

  4. Deep Learning学习随记(一)稀疏自编码器

    最近开始看Deep Learning,随手记点,方便以后查看. 主要参考资料是Stanford 教授 Andrew Ng 的 Deep Learning 教程讲义:http://deeplearnin ...

  5. Deep Learning 学习随记(七)Convolution and Pooling --卷积和池化

    图像大小与参数个数: 前面几章都是针对小图像块处理的,这一章则是针对大图像进行处理的.两者在这的区别还是很明显的,小图像(如8*8,MINIST的28*28)可以采用全连接的方式(即输入层和隐含层直接 ...

  6. Deep Learning 学习随记(四)自学习和非监督特征学习

    接着看讲义,接下来这章应该是Self-Taught Learning and Unsupervised Feature Learning. 含义: 从字面上不难理解其意思.这里的self-taught ...

  7. Deep Learning学习随记(二)Vectorized、PCA和Whitening

    接着上次的记,前面看了稀疏自编码.按照讲义,接下来是Vectorized, 翻译成向量化?暂且这么认为吧. Vectorized: 这节是老师教我们编程技巧了,这个向量化的意思说白了就是利用已经被优化 ...

  8. Deep Learning 学习随记(八)CNN(Convolutional neural network)理解

    前面Andrew Ng的讲义基本看完了.Andrew讲的真是通俗易懂,只是不过瘾啊,讲的太少了.趁着看完那章convolution and pooling, 自己又去翻了翻CNN的相关东西. 当时看讲 ...

  9. Deep Learning 学习随记(六)Linear Decoder 线性解码

    线性解码器(Linear Decoder) 前面第一章提到稀疏自编码器(http://www.cnblogs.com/bzjia-blog/p/SparseAutoencoder.html)的三层网络 ...

随机推荐

  1. apache 服务发布多个项目,只需要更改配置文件(需要设定虚拟主机)

    http://www.php186.com/content/article/apache/24609.html http://blog.sina.com.cn/s/blog_6b689d5901013 ...

  2. screen space directional occlusion(SSDO) in Unity5

    也许是哪里出了问题..效果一般 16采样点 Gird . Random 博主近期渲染:最近用unity5弄的一些渲染 ---- by wolf96  http://blog.csdn.net/wolf ...

  3. git入门超详细(转载)

    转自:http://www.cnblogs.com/tugenhua0707/p/4050072.html Git使用教程 一:Git是什么? Git是目前世界上最先进的分布式版本控制系统. 二:SV ...

  4. 4G来临 IT业转型之路当在不远

    摘 要:4G商用未启,品牌营销争夺已经展开.目前,除了中国移动推出全新4G品牌“andM”之外,中国电信和中国联通均选择继续沿用3G的品牌. 4G商用未启,品牌营销争夺已经展开.12月10日,中国电信 ...

  5. 另一份Java应用调优指南之-前菜

    每一次成功的调优,都会诞生又一份的调优指南. 一些必须写在前面的军规,虽然与Java应用的调优没直接关联,但是测试同学经常不留神的地方. 1 独占你的测试机器 包括跑JMeter的那些机器. &quo ...

  6. Ubuntu 虚拟机环境安装配置指南

    1. 安装Ubuntu到虚拟机. 到 Ubuntu 上下载桌面版iso文件,加载到虚拟机,开始安装,傻瓜式操作不用多说.2. 调整屏幕分辨率. 虚拟机软件我是用的virtual box,在工具栏上设备 ...

  7. VS2012的自动生成测试的插件 Unit Test Generator

    Unit Test Generator extension是一个VS2012的插件,可以为C#的public方法很方便的自动生成unit test.安装这个插件后点击TEST菜单可以配置,如下所示: ...

  8. 在命令行cmd 下,输入dxdiag 查看关于电脑许多硬件的详细信息

    输入命令就可以查看本机的很多硬件的详细信息: 在命令行cmd 下,输入dxdiag  查看关于电脑许多硬件的详细信息

  9. win7重装系统时,使用PE工具箱进入系统看到的“C盘变成0.2G,D盘变成48G左右”这是什么回事?

    引入: 今天帮同学重装系统,重装系统使用的方法是利用PE工具箱制作出启动U盘,进行重装系统. 我的步骤是 第一步:开机按F2挂载U盘优先启动,于是开机时就进入PE微系统 第二步: 用分区工具(Disk ...

  10. ajaxPro用法

    一.AjaxPro的使用 1.在项目中添加引用,浏览找到AjaxPro.2.dll文件 2.在Web.config中的system.web里面写入以下代码 </configuration> ...