CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

1. Feedforward and cost function;

$J(\theta)= \frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}[-y_{k}^{(i)}log((h_{\theta}(x^{(i)}))_{k})-(1-y_{k}^{(i)})log(1-(h_{\theta}(x^{(i)}))_{k})]$

2.Regularized cost function:

$J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}[-y_{k}^{(i)}log((h_{\theta}(x^{(i)}))_{k})-(1-y_{k}^{(i)})log(1-(h_{\theta}(x^{(i)}))_{k})]+\frac{\lambda}{2m}[\sum_{j=1}^{25}\sum_{k=1}^{400}(\Theta_{j,k}^{(1)})^{2}+\sum_{j=1}^{10}\sum_{k=1}^{25}(\Theta_{j,k}^{(2)})^{2}]$

3.Sigmoid gradient

The gradient for the sigmoid function can be computed as:

$g'(z)=\frac{d}{dz}g(z)=g(z)(1-g(z))$

where:

$sigmoid(z)=g(z)=\frac{1}{1+e^{-z}}$

4.Random initialization

randInitializeWeights.m

 function W = randInitializeWeights(L_in, L_out)

 %RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in

 %incoming connections and L_out outgoing connections

 %   W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights

 %   of a layer with L_in incoming connections and L_out outgoing

 %   connections.

 %

 %   Note that W should be set to a matrix of size(L_out,  + L_in) as

 %   the column row of W handles the "bias" terms

 %

 % You need to return the following variables correctly

 W = zeros(L_out,  + L_in);

 % ====================== YOUR CODE HERE ======================

 % Instructions: Initialize W randomly so that we break the symmetry while

 %               training the neural network.

 %

 % Note: The first row of W corresponds to the parameters for the bias units

 %

 epsilon_init = 0.12;

 W = rand(L_out,  + L_in) *  * epsilon_init - epsilon_init;

 % =========================================================================

 end

5.Backpropagation(using a for-loop for t=1:m and place steps 1-4 below inside the for-loop), with the t^th iteration perfoming the calculation on the t^th training example(x^(t),y^(t)).Step 5 will divide the accumulated gradients by m to obtain the gradients for the neural network cost function.

(1) Set the input layer's values(a⁽¹⁾) to the t-th training example x^(t). Perform a feedforward pass, computing the activations(z⁽²⁾,a⁽²⁾,z⁽³⁾,a⁽³⁾) for layers 2 and 3.

(2) For each output unit k in layer 3(the output layer), set :

$\delta_{k}^{(3)}=(a_{k}^{(3)}-y_{k})$

where y_k = 1 or 0.

(3)For the hidden layer l=2, set:

$\delta^{(2)}=(\Theta^{(2)})^{T}\delta^{(3)}.*g'(z^{(2)})$

(4) Accumulate the gradient from this example using the following formula. Note that you should skip or remove δ₀(2).

$\Delta^{(l)}=\Delta^{(l)}+\delta^{(l+1)}(a^{\text{(l)}})^{T}$

(5) Obtain the(unregularized) gradient for the neural network cost function by dividing the accumulated gradients by 1/m:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}$

nnCostFunction.m

 function [J grad] = nnCostFunction(nn_params, ...

                                    input_layer_size, ...

                                    hidden_layer_size, ...

                                    num_labels, ...

                                    X, y, lambda)

 %NNCOSTFUNCTION Implements the neural network cost function for a two layer

 %neural network which performs classification

 %   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...

 %   X, y, lambda) computes the cost and gradient of the neural network. The

 %   parameters for the neural network are "unrolled" into the vector

 %   nn_params and need to be converted back into the weight matrices.

 %

 %   The returned parameter grad should be a "unrolled" vector of the

 %   partial derivatives of the neural network.

 %

 % Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices

 % for our  layer neural network

 Theta1 = reshape(nn_params(:hidden_layer_size * (input_layer_size + )), ...

                  hidden_layer_size, (input_layer_size + ));

 Theta2 = reshape(nn_params(( + (hidden_layer_size * (input_layer_size + ))):end), ...

                  num_labels, (hidden_layer_size + ));

 % Setup some useful variables

 m = size(X, );

 % You need to return the following variables correctly

 J = ;

 Theta1_grad = zeros(size(Theta1));

 Theta2_grad = zeros(size(Theta2));

 % ====================== YOUR CODE HERE ======================

 % Instructions: You should complete the code by working through the

 %               following parts.

 %

 % Part : Feedforward the neural network and return the cost in the

 %         variable J. After implementing Part , you can verify that your

 %         cost function computation is correct by verifying the cost

 %         computed in ex4.m

 %

 % Part : Implement the backpropagation algorithm to compute the gradients

 %         Theta1_grad and Theta2_grad. You should return the partial derivatives of

 %         the cost function with respect to Theta1 and Theta2 in Theta1_grad and

 %         Theta2_grad, respectively. After implementing Part , you can check

 %         that your implementation is correct by running checkNNGradients

 %

 %         Note: The vector y passed into the function is a vector of labels

 %               containing values from ..K. You need to map this vector into a

 %               binary vector of 's and 0's to be used with the neural network

 %               cost function.

 %

 %         Hint: We recommend implementing backpropagation using a for-loop

 %               over the training examples if you are implementing it for the

 %               first time.

 %

 % Part : Implement regularization with the cost function and gradients.

 %

 %         Hint: You can implement this around the code for

 %               backpropagation. That is, you can compute the gradients for

 %               the regularization separately and then add them to Theta1_grad

 %               and Theta2_grad from Part .

 %

 %Part

 %Theta1 has size *

 %Theta2 has size *

 %y hase size *

 K = num_labels;

 Y = eye(K)(y,:); %[ ]

 a1 = [ones(m,),X];%[ ]

 a2 = sigmoid(a1*Theta1'); %[5000 25]

 a2 = [ones(m,),a2];%[ ]

 h = sigmoid(a2*Theta2');%[5000 10]

 costPositive = -Y.*log(h);

 costNegtive = (-Y).*log(-h);

 cost = costPositive - costNegtive;

 J = (/m)*sum(cost(:));

 %Regularized

 Theta1Filtered = Theta1(:,:end); %[ ]

 Theta2Filtered = Theta2(:,:end); %[ ]

 reg = (lambda/(*m))*(sumsq(Theta1Filtered(:))+sumsq(Theta2Filtered(:)));

 J = J + reg;

 %Part

 Delta1 = ;

 Delta2 = ;

 for t=:m,

   %step

   a1 = [ X(t,:)]; %[ ]

   z2 = a1*Theta1'; %[1 25]

   a2 = [ sigmoid(z2)];%[ ]

   z3 = a2*Theta2'; %[1 10]

   a3 = sigmoid(z3); %[ ]

   %step

   yt = Y(t,:);%[ ]

   d3 = a3-yt; %[ ]

   %step

   %   [ ]  [ ]           [ ]

   d2 = (d3*Theta2Filtered).*sigmoidGradient(z2); %[ ]

   %step

   Delta1 = Delta1 + (d2'*a1);%[25 401]

   Delta2 = Delta2 + (d3'*a2);%[10 26]

 end;

 %step

 Theta1_grad = (/m)*Delta1;

 Theta2_grad = (/m)*Delta2;

 %Part

 Theta1_grad(:,:end) = Theta1_grad(:,:end) + ((lambda/m)*Theta1Filtered);

 Theta2_grad(:,:end) = Theta2_grad(:,:end) + ((lambda/m)*Theta2Filtered);

 % -------------------------------------------------------------

 % =========================================================================

 % Unroll gradients

 grad = [Theta1_grad(:) ; Theta2_grad(:)];

 end

6.Gradient checking

Let

$\theta^{(i+)}=\theta+\left[\begin{array}{c}0\\\vdots\\\epsilon\\\vdots\end{array}\right]$

and

$\theta^{(i-)}=\theta-\left[\begin{array}{c}0\\\vdots\\\epsilon\\\vdots\end{array}\right]$

for each i, that:

$f_{i}(\theta)\thickapprox\frac{J(\theta^{(i+)})-J(\theta^{(i-)})}{2\epsilon}$

computeNumericalGradient.m

 function numgrad = computeNumericalGradient(J, theta)

 %COMPUTENUMERICALGRADIENT Computes the gradient using "finite differences"

 %and gives us a numerical estimate of the gradient.

 %   numgrad = COMPUTENUMERICALGRADIENT(J, theta) computes the numerical

 %   gradient of the function J around theta. Calling y = J(theta) should

 %   return the function value at theta.

 % Notes: The following code implements numerical gradient checking, and

 %        returns the numerical gradient.It sets numgrad(i) to (a numerical

 %        approximation of) the partial derivative of J with respect to the

 %        i-th input argument, evaluated at theta. (i.e., numgrad(i) should

 %        be the (approximately) the partial derivative of J with respect

 %        to theta(i).)

 %                

 numgrad = zeros(size(theta));

 perturb = zeros(size(theta));

 e = 1e-;

 for p = :numel(theta)

     % Set perturbation vector

     perturb(p) = e;

     loss1 = J(theta - perturb);

     loss2 = J(theta + perturb);

     % Compute Numerical Gradient

     numgrad(p) = (loss2 - loss1) / (*e);

     perturb(p) = ;

 end

 end

7.Regularized Neural Networks

for j=0:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}$

for j>=1:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}+\frac{\lambda}{m}\Theta_{ij}^{(l)}$

别人的代码：

https://github.com/jcgillespie/Coursera-Machine-Learning/tree/master/ex4

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)的更多相关文章

CheeseZH: Stanford University: Machine Learning Ex3: Multiclass Logistic Regression and Neural Network Prediction
Handwritten digits recognition (0-9) Multi-class Logistic Regression 1. Vectorizing Logistic Regress ...
CheeseZH: Stanford University: Machine Learning Ex5:Regularized Linear Regression and Bias v.s. Variance
源码:https://github.com/cheesezhe/Coursera-Machine-Learning-Exercise/tree/master/ex5 Introduction: In ...
CheeseZH: Stanford University: Machine Learning Ex2:Logistic Regression
1. Sigmoid Function In Logisttic Regression, the hypothesis is defined as: where function g is the s ...
CheeseZH: Stanford University: Machine Learning Ex1:Linear Regression
(1) How to comput the Cost function in Univirate/Multivariate Linear Regression; (2) How to comput t ...
Machine Learning, Homework 9, Neural Nets
Machine Learning, Homework 9, Neural NetsApril 15, 2019ContentsBoston Housing with a Single Layer an ...
【MetaPruning】2019-ICCV-MetaPruning Meta Learning for Automatic Neural Network Channel Pruning-论文阅读
MetaPruning 2019-ICCV-MetaPruning Meta Learning for Automatic Neural Network Channel Pruning Zechun ...
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning 2019-08-11 19:48:17 Paper: h ...
Stanford CS229 Machine Learning by Andrew Ng
CS229 Machine Learning Stanford Course by Andrew Ng Course material, problem set Matlab code written ...
Machine Learning No.5: Neural networks
1. advantage: when number of features is too large, so previous algorithm is not a good way to learn ...

随机推荐

python 加密方式(MD5&sha&hashlib)
1.MD5加密 import md5 m = md5.new() #或者m = md5.md5() m.update('123456') m.hexdigest() #或者md5.md5('12345 ...
Zookeeper的基本操作
写在前面的话:读书破万卷,编码如有神 -------------------------------------------------------------------- 参考内容: <私塾 ...
python开发_tkinter_图形随鼠标移动
做这个东西的时候,灵感源自于一个js效果: 两个眼睛随鼠标移动而移动运行效果: =============================================== 代码部分: ===== ...
Codeforces Round #352 (Div. 2) B. Different is Good 水题
B. Different is Good 题目连接: http://www.codeforces.com/contest/672/problem/B Description A wise man to ...
Git_从远程库克隆
上次我们讲了先有本地库,后有远程库的时候,如何关联远程库. 现在,假设我们从零开发,那么最好的方式是先创建远程库,然后,从远程库克隆. 首先,登陆GitHub,创建一个新的仓库,名字叫gitskill ...
UEFI引导模式
Author: JinDate: 20140827System: windows 刚帮楼下的公司解决了个问题. 原来的办公电脑,预装linux,他们重装成win7.新买的电脑预装成win8,安装出问题 ...
小程序获取openid unionid session_key
.wxml <button bindtap="paytap">授权</button> .js Page({ paytap: function () { va ...
dwr.jar简介
DWR(Direct Web Remotiong)是一个用于改善web页面与Java类交互的远程服务器端Ajax开源框架, 可以帮助开发人员开发包含AJAX技术的网站.它可以允许在浏览器里的代码使用运 ...
QQclient团队博客
Windows 8 视频採集 http://impd.tencent.com/?p=25 句柄泄漏检測工具的实现原理 http://impd.tencent.com/?p=29
关于OPC Client 编写
昨天又有人问我 OPC Client 编写,实际是他们不了解OPC 客户端的工作原理,要想写客户端程序,必须知道OPC对象, OPC逻辑对象模型包括3类对象:OPC server对象.OPC grou ...

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)的更多相关文章

随机推荐

热门专题