CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

1. Feedforward and cost function;

$J(\theta)= \frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}[-y_{k}^{(i)}log((h_{\theta}(x^{(i)}))_{k})-(1-y_{k}^{(i)})log(1-(h_{\theta}(x^{(i)}))_{k})]$

2.Regularized cost function:

$J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}[-y_{k}^{(i)}log((h_{\theta}(x^{(i)}))_{k})-(1-y_{k}^{(i)})log(1-(h_{\theta}(x^{(i)}))_{k})]+\frac{\lambda}{2m}[\sum_{j=1}^{25}\sum_{k=1}^{400}(\Theta_{j,k}^{(1)})^{2}+\sum_{j=1}^{10}\sum_{k=1}^{25}(\Theta_{j,k}^{(2)})^{2}]$

3.Sigmoid gradient

The gradient for the sigmoid function can be computed as:

$g'(z)=\frac{d}{dz}g(z)=g(z)(1-g(z))$

where:

$sigmoid(z)=g(z)=\frac{1}{1+e^{-z}}$

4.Random initialization

randInitializeWeights.m

 function W = randInitializeWeights(L_in, L_out)

 %RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in

 %incoming connections and L_out outgoing connections

 %   W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights

 %   of a layer with L_in incoming connections and L_out outgoing

 %   connections.

 %

 %   Note that W should be set to a matrix of size(L_out,  + L_in) as

 %   the column row of W handles the "bias" terms

 %

 % You need to return the following variables correctly

 W = zeros(L_out,  + L_in);

 % ====================== YOUR CODE HERE ======================

 % Instructions: Initialize W randomly so that we break the symmetry while

 %               training the neural network.

 %

 % Note: The first row of W corresponds to the parameters for the bias units

 %

 epsilon_init = 0.12;

 W = rand(L_out,  + L_in) *  * epsilon_init - epsilon_init;

 % =========================================================================

 end

5.Backpropagation(using a for-loop for t=1:m and place steps 1-4 below inside the for-loop), with the t^th iteration perfoming the calculation on the t^th training example(x^(t),y^(t)).Step 5 will divide the accumulated gradients by m to obtain the gradients for the neural network cost function.

(1) Set the input layer's values(a⁽¹⁾) to the t-th training example x^(t). Perform a feedforward pass, computing the activations(z⁽²⁾,a⁽²⁾,z⁽³⁾,a⁽³⁾) for layers 2 and 3.

(2) For each output unit k in layer 3(the output layer), set :

$\delta_{k}^{(3)}=(a_{k}^{(3)}-y_{k})$

where y_k = 1 or 0.

(3)For the hidden layer l=2, set:

$\delta^{(2)}=(\Theta^{(2)})^{T}\delta^{(3)}.*g'(z^{(2)})$

(4) Accumulate the gradient from this example using the following formula. Note that you should skip or remove δ₀(2).

$\Delta^{(l)}=\Delta^{(l)}+\delta^{(l+1)}(a^{\text{(l)}})^{T}$

(5) Obtain the(unregularized) gradient for the neural network cost function by dividing the accumulated gradients by 1/m:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}$

nnCostFunction.m

 function [J grad] = nnCostFunction(nn_params, ...

                                    input_layer_size, ...

                                    hidden_layer_size, ...

                                    num_labels, ...

                                    X, y, lambda)

 %NNCOSTFUNCTION Implements the neural network cost function for a two layer

 %neural network which performs classification

 %   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...

 %   X, y, lambda) computes the cost and gradient of the neural network. The

 %   parameters for the neural network are "unrolled" into the vector

 %   nn_params and need to be converted back into the weight matrices.

 %

 %   The returned parameter grad should be a "unrolled" vector of the

 %   partial derivatives of the neural network.

 %

 % Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices

 % for our  layer neural network

 Theta1 = reshape(nn_params(:hidden_layer_size * (input_layer_size + )), ...

                  hidden_layer_size, (input_layer_size + ));

 Theta2 = reshape(nn_params(( + (hidden_layer_size * (input_layer_size + ))):end), ...

                  num_labels, (hidden_layer_size + ));

 % Setup some useful variables

 m = size(X, );

 % You need to return the following variables correctly

 J = ;

 Theta1_grad = zeros(size(Theta1));

 Theta2_grad = zeros(size(Theta2));

 % ====================== YOUR CODE HERE ======================

 % Instructions: You should complete the code by working through the

 %               following parts.

 %

 % Part : Feedforward the neural network and return the cost in the

 %         variable J. After implementing Part , you can verify that your

 %         cost function computation is correct by verifying the cost

 %         computed in ex4.m

 %

 % Part : Implement the backpropagation algorithm to compute the gradients

 %         Theta1_grad and Theta2_grad. You should return the partial derivatives of

 %         the cost function with respect to Theta1 and Theta2 in Theta1_grad and

 %         Theta2_grad, respectively. After implementing Part , you can check

 %         that your implementation is correct by running checkNNGradients

 %

 %         Note: The vector y passed into the function is a vector of labels

 %               containing values from ..K. You need to map this vector into a

 %               binary vector of 's and 0's to be used with the neural network

 %               cost function.

 %

 %         Hint: We recommend implementing backpropagation using a for-loop

 %               over the training examples if you are implementing it for the

 %               first time.

 %

 % Part : Implement regularization with the cost function and gradients.

 %

 %         Hint: You can implement this around the code for

 %               backpropagation. That is, you can compute the gradients for

 %               the regularization separately and then add them to Theta1_grad

 %               and Theta2_grad from Part .

 %

 %Part

 %Theta1 has size *

 %Theta2 has size *

 %y hase size *

 K = num_labels;

 Y = eye(K)(y,:); %[ ]

 a1 = [ones(m,),X];%[ ]

 a2 = sigmoid(a1*Theta1'); %[5000 25]

 a2 = [ones(m,),a2];%[ ]

 h = sigmoid(a2*Theta2');%[5000 10]

 costPositive = -Y.*log(h);

 costNegtive = (-Y).*log(-h);

 cost = costPositive - costNegtive;

 J = (/m)*sum(cost(:));

 %Regularized

 Theta1Filtered = Theta1(:,:end); %[ ]

 Theta2Filtered = Theta2(:,:end); %[ ]

 reg = (lambda/(*m))*(sumsq(Theta1Filtered(:))+sumsq(Theta2Filtered(:)));

 J = J + reg;

 %Part

 Delta1 = ;

 Delta2 = ;

 for t=:m,

   %step

   a1 = [ X(t,:)]; %[ ]

   z2 = a1*Theta1'; %[1 25]

   a2 = [ sigmoid(z2)];%[ ]

   z3 = a2*Theta2'; %[1 10]

   a3 = sigmoid(z3); %[ ]

   %step

   yt = Y(t,:);%[ ]

   d3 = a3-yt; %[ ]

   %step

   %   [ ]  [ ]           [ ]

   d2 = (d3*Theta2Filtered).*sigmoidGradient(z2); %[ ]

   %step

   Delta1 = Delta1 + (d2'*a1);%[25 401]

   Delta2 = Delta2 + (d3'*a2);%[10 26]

 end;

 %step

 Theta1_grad = (/m)*Delta1;

 Theta2_grad = (/m)*Delta2;

 %Part

 Theta1_grad(:,:end) = Theta1_grad(:,:end) + ((lambda/m)*Theta1Filtered);

 Theta2_grad(:,:end) = Theta2_grad(:,:end) + ((lambda/m)*Theta2Filtered);

 % -------------------------------------------------------------

 % =========================================================================

 % Unroll gradients

 grad = [Theta1_grad(:) ; Theta2_grad(:)];

 end

6.Gradient checking

Let

$\theta^{(i+)}=\theta+\left[\begin{array}{c}0\\\vdots\\\epsilon\\\vdots\end{array}\right]$

and

$\theta^{(i-)}=\theta-\left[\begin{array}{c}0\\\vdots\\\epsilon\\\vdots\end{array}\right]$

for each i, that:

$f_{i}(\theta)\thickapprox\frac{J(\theta^{(i+)})-J(\theta^{(i-)})}{2\epsilon}$

computeNumericalGradient.m

 function numgrad = computeNumericalGradient(J, theta)

 %COMPUTENUMERICALGRADIENT Computes the gradient using "finite differences"

 %and gives us a numerical estimate of the gradient.

 %   numgrad = COMPUTENUMERICALGRADIENT(J, theta) computes the numerical

 %   gradient of the function J around theta. Calling y = J(theta) should

 %   return the function value at theta.

 % Notes: The following code implements numerical gradient checking, and

 %        returns the numerical gradient.It sets numgrad(i) to (a numerical

 %        approximation of) the partial derivative of J with respect to the

 %        i-th input argument, evaluated at theta. (i.e., numgrad(i) should

 %        be the (approximately) the partial derivative of J with respect

 %        to theta(i).)

 %                

 numgrad = zeros(size(theta));

 perturb = zeros(size(theta));

 e = 1e-;

 for p = :numel(theta)

     % Set perturbation vector

     perturb(p) = e;

     loss1 = J(theta - perturb);

     loss2 = J(theta + perturb);

     % Compute Numerical Gradient

     numgrad(p) = (loss2 - loss1) / (*e);

     perturb(p) = ;

 end

 end

7.Regularized Neural Networks

for j=0:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}$

for j>=1:

$\frac{\partial}{\partial\Theta_{ij}^{(l)}}J(\Theta)=D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}+\frac{\lambda}{m}\Theta_{ij}^{(l)}$

别人的代码：

https://github.com/jcgillespie/Coursera-Machine-Learning/tree/master/ex4

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)的更多相关文章

CheeseZH: Stanford University: Machine Learning Ex3: Multiclass Logistic Regression and Neural Network Prediction
Handwritten digits recognition (0-9) Multi-class Logistic Regression 1. Vectorizing Logistic Regress ...
CheeseZH: Stanford University: Machine Learning Ex5:Regularized Linear Regression and Bias v.s. Variance
源码:https://github.com/cheesezhe/Coursera-Machine-Learning-Exercise/tree/master/ex5 Introduction: In ...
CheeseZH: Stanford University: Machine Learning Ex2:Logistic Regression
1. Sigmoid Function In Logisttic Regression, the hypothesis is defined as: where function g is the s ...
CheeseZH: Stanford University: Machine Learning Ex1:Linear Regression
(1) How to comput the Cost function in Univirate/Multivariate Linear Regression; (2) How to comput t ...
Machine Learning, Homework 9, Neural Nets
Machine Learning, Homework 9, Neural NetsApril 15, 2019ContentsBoston Housing with a Single Layer an ...
【MetaPruning】2019-ICCV-MetaPruning Meta Learning for Automatic Neural Network Channel Pruning-论文阅读
MetaPruning 2019-ICCV-MetaPruning Meta Learning for Automatic Neural Network Channel Pruning Zechun ...
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning 2019-08-11 19:48:17 Paper: h ...
Stanford CS229 Machine Learning by Andrew Ng
CS229 Machine Learning Stanford Course by Andrew Ng Course material, problem set Matlab code written ...
Machine Learning No.5: Neural networks
1. advantage: when number of features is too large, so previous algorithm is not a good way to learn ...

随机推荐

bzoj 2555
暴力. 收获: 1.第一道后缀自动机,大概知道怎么写了,有一些原理性的东西还要理解. 2.计算right集合的大小 /***************************************** ...
AVL树理解
AVL树理解简介我们知道,AVL树也是平衡树中的一种,是自带平衡条件的二叉树,始终都在维护树的高度,保持着树的高度为logN,同时把插入.查找.删除一个结点的时间复杂度的最好和最坏情况都维持在O( ...
Java中常见的IO流及其使用
Java中IO流分成两大类,一种是输入流.全部的输入流都直接或间接继承自InputStream抽象类,输入流作为数据的来源.我们能够通过输入流的read方法读取字节数据.还有一种是输出流,全部的输出流 ...
BTSync FREE vs BTSync PRO
Although both BitTorrent Sync 2.0 FREE and PRO ensure high file transfer speed and ultimate security ...
2013Esri全球用户大会之ArcGIS for Server&Portal for ArcGIS
Q1:ArcGIS 10.2 for Server有哪些新特性? ArcGIS 10.2对于ArcGIS for Server来说是一个引人注目的版本.它建立在ArcGIS 10.1扎实雄厚的基础上, ...
linux 内核大牛-谢宝友
http://blog.chinaunix.net/uid/25845340.html 谢宝友:毕业于四川省税务学校税收专业,现供职于中兴通讯操作系统团队,对操作系统内核有较强的兴趣.专职于操作系统内 ...
Matlab的linprog解决简单线性规划问题
一个简单的线性规划问题,使用Matlab的linprog解决假定有n种煤,各种煤的配比为x1,x2,x3,……首先需要满足下列两个约束条件,即 x1+x2+x3……+xn=1 x1≥0, x2≥0, ...
[Ubuntu] ubuntu的tty下挂载移动硬盘拷贝数据
转载:http://blog.csdn.net/langb2014/article/details/51567460 更换CUDA好多人都更换成功了,我却失败了,然后电脑最后进不了界面了,只有tty端 ...
0, \0, NULL
字符串.字符数组输入.输出与'\0'的问题原创首发,欢迎转载! 作者按字符串.字符数组以"%s"格式输入时,以遇到'空格'为这个字符串输入结束. 字符串.字符数组以" ...
【ContestHunter】【弱省胡策】【Round5】
反演+FFT+构造+DP 写了这么多tag,其实我一个也不会 A 第一题是反演……数据范围10W,看着就有种要用FFT等神奇算法的感觉……然而蒟蒻并不会推公式,只好写了20+10分的暴力,然而特判30 ...

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)的更多相关文章

随机推荐

热门专题