[Machine Learning][BP]The Vectorized Back Propagation Algorithm
Reference: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf
I spent nearly one hour to deduce the vector form of the back propagation. Just in case that I may forget, but need to utilize them, I will write down all the formula here to make a backup.
Structure:
Standard BP Network with $\displaystyle \lambda$ hidden layers, one input layer and one output layer.
Activation function: sigmoid.
Notations:
$\displaystyle W^{i+1,i}$, denotes the weight matrix connecting from $i$th layer to $i+1$th layer.
$\displaystyle N^i$, denotes the net input of the $i$th layer.
$\displaystyle A^i$, denotes the activation input of the $i$th layer.
$\displaystyle \delta ^i$, denotes the error of the $i$th layer.
$\displaystyle \epsilon$, denotes the learning rate.
*, stands for element by element multiplication.
(omit), stands for matrix multiplication.
Specifically,
$\displaystyle X$, denotes the input layer, while equals $\displaystyle A^0$.
$\displaystyle A^{\lambda + 1}$, denotes the output layer.
$\displaystyle Y$, denotes the expected output.
Propagations:
Forward:
$\displaystyle N^i = W^{i,i-1}A^{i-1}$.
$\displaystyle A^i = \frac{1}{1+e^{-N^i}}$.
Backward:
$\displaystyle \Delta W^{i+1,i} = \epsilon \delta^{i+1}(A^{i})^{T}$.
$\displaystyle \delta ^i = ((\delta^{i+1})^{T}W^{i+1,i})^{T}*A^{i}*(1-A^{i})$.
$\displaystyle \delta ^{\lambda + 1} = (Y - A^{\lambda + 1})*A^{\lambda + 1}*(1-A^{\lambda + 1})$.
Deduction:
I am not capable of taking the partial derivative of vector or matrix over vector or matrix, so I derive these formulas by observing the formula for each element in the matrix and extend it to the vector form.
$\displaystyle \Delta W^{\lambda+1,\lambda}_{i,j} = \epsilon (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)A^{\lambda}_j$.
Let's assume $\displaystyle \delta ^{\lambda+1}_{i} := (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)$.
$\displaystyle \Delta W^{\lambda,\lambda-1}_{i,j}=\epsilon (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})A_j^{\lambda-1}$.
Let's assume $\displaystyle \delta ^{\lambda}_{i} := (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})$.
The left are reserved for the readers to complete.
[Machine Learning][BP]The Vectorized Back Propagation Algorithm的更多相关文章
- CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)
1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...
- Bayesian machine learning
from: http://www.metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse( ...
- 机器学习算法之旅A Tour of Machine Learning Algorithms
In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...
- [GPU] Machine Learning on C++
一.MPI为何物? 初步了解:MPI集群环境搭建 二.重新认识Spark 链接:https://www.zhihu.com/question/48743915/answer/115738668 马铁大 ...
- A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning
A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on S ...
- Machine Learning—Mixtures of Gaussians and the EM algorithm
印象笔记同步分享:Machine Learning-Mixtures of Gaussians and the EM algorithm
- AUTOML --- Machine Learning for Automated Algorithm Design.
自动算法的机器学习: Machine Learning for Automated Algorithm Design. http://www.ml4aad.org/ AutoML——降低机器学习门槛的 ...
- (转)Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning
Introduction Optimization is always the ultimate goal whether you are dealing with a real life probl ...
- machine learning model(algorithm model) .vs. statistical model
https://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/ http: ...
随机推荐
- Python 基础之if if else
1.代码块 以冒号作为开始,用缩进来划分区域,这个整体叫做代码块 if 5 == 5: print(1) print(2) if True: print(3) print(4) ...
- Python 基础之面向对象类的继承与多态
一.继承 定义:一个类除了拥有自身的属性方法之外,还拥有另外一个类的属性和方法继承: 1.单继承 2.多继承子类: 一个类继承了另外一个类,那么这个类是子类(衍生类)父类:一个类继承了另外一个类,被继 ...
- 【JAVA蓝桥杯】基础练习1 十进制转十六进制
资源限制 时间限制:1.0s 内存限制:512.0MB 问题描述 十六进制数是在程序设计时经常要使用到的一种整数的表示方式.它有0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F共16 ...
- 5款微信小程序开发工具使用报告,微信官方开发工具还有待提升
微信小程序已经内测有一段时间了,笔者本着好奇加学习的心态写了几个小demo,虽然在MINA框架上并没有遇到太多的坑,但官方开发工具实在不敢恭维. api提示不全,要一个个查api啊,写代码超级慢啊 很 ...
- helm基本用法
一.helm命令 helm search #关键字搜索charts helm pull #压缩下载chart到本地,可以使用--untar下载解压) helm install #部署chart到kub ...
- C 随机数产生
// ConsoleApplication5.cpp : Defines the entry point for the console application. // #include " ...
- STM32CubeIDE printf 串口重定向
- JDBC--利用反射及JDBC元数据编写通用的查询方法
1.JDBC元数据(ResuleSetMetaData):描述ResultSet的元数据对象,可以从中获取到结果集中的列数和列名等: --使用ResultSet类的getMetaData()方法获得R ...
- 前端学习笔记系列一:15vscode汉化、快速复制行、网页背景图有效设置、 dl~dt~dd标签使用
ctrl+shift+p,调出configure display language,选择en或zh,若没有则选择安装使用其它语言,则直接呼出扩展程序搜索界面,选择,然后安装,重启即可. shift+a ...
- PyCharm无法找到已安装的Python类库的解决方法
一.问题描述 软件系统:Windows10.JetBrains PyCharm Edu 2018.1.1 x64 在命令行cmd中安装python类库包Numpy.Matplotlib.Pandas. ...