[Machine Learning][BP]The Vectorized Back Propagation Algorithm
Reference: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf
I spent nearly one hour to deduce the vector form of the back propagation. Just in case that I may forget, but need to utilize them, I will write down all the formula here to make a backup.
Structure:
Standard BP Network with $\displaystyle \lambda$ hidden layers, one input layer and one output layer.
Activation function: sigmoid.
Notations:
$\displaystyle W^{i+1,i}$, denotes the weight matrix connecting from $i$th layer to $i+1$th layer.
$\displaystyle N^i$, denotes the net input of the $i$th layer.
$\displaystyle A^i$, denotes the activation input of the $i$th layer.
$\displaystyle \delta ^i$, denotes the error of the $i$th layer.
$\displaystyle \epsilon$, denotes the learning rate.
*, stands for element by element multiplication.
(omit), stands for matrix multiplication.
Specifically,
$\displaystyle X$, denotes the input layer, while equals $\displaystyle A^0$.
$\displaystyle A^{\lambda + 1}$, denotes the output layer.
$\displaystyle Y$, denotes the expected output.
Propagations:
Forward:
$\displaystyle N^i = W^{i,i-1}A^{i-1}$.
$\displaystyle A^i = \frac{1}{1+e^{-N^i}}$.
Backward:
$\displaystyle \Delta W^{i+1,i} = \epsilon \delta^{i+1}(A^{i})^{T}$.
$\displaystyle \delta ^i = ((\delta^{i+1})^{T}W^{i+1,i})^{T}*A^{i}*(1-A^{i})$.
$\displaystyle \delta ^{\lambda + 1} = (Y - A^{\lambda + 1})*A^{\lambda + 1}*(1-A^{\lambda + 1})$.
Deduction:
I am not capable of taking the partial derivative of vector or matrix over vector or matrix, so I derive these formulas by observing the formula for each element in the matrix and extend it to the vector form.
$\displaystyle \Delta W^{\lambda+1,\lambda}_{i,j} = \epsilon (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)A^{\lambda}_j$.
Let's assume $\displaystyle \delta ^{\lambda+1}_{i} := (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)$.
$\displaystyle \Delta W^{\lambda,\lambda-1}_{i,j}=\epsilon (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})A_j^{\lambda-1}$.
Let's assume $\displaystyle \delta ^{\lambda}_{i} := (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})$.
The left are reserved for the readers to complete.
[Machine Learning][BP]The Vectorized Back Propagation Algorithm的更多相关文章
- CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)
1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...
- Bayesian machine learning
from: http://www.metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse( ...
- 机器学习算法之旅A Tour of Machine Learning Algorithms
In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...
- [GPU] Machine Learning on C++
一.MPI为何物? 初步了解:MPI集群环境搭建 二.重新认识Spark 链接:https://www.zhihu.com/question/48743915/answer/115738668 马铁大 ...
- A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning
A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on S ...
- Machine Learning—Mixtures of Gaussians and the EM algorithm
印象笔记同步分享:Machine Learning-Mixtures of Gaussians and the EM algorithm
- AUTOML --- Machine Learning for Automated Algorithm Design.
自动算法的机器学习: Machine Learning for Automated Algorithm Design. http://www.ml4aad.org/ AutoML——降低机器学习门槛的 ...
- (转)Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning
Introduction Optimization is always the ultimate goal whether you are dealing with a real life probl ...
- machine learning model(algorithm model) .vs. statistical model
https://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/ http: ...
随机推荐
- 键盘类型UIKeyboardType
UITextField.UITextView等能够调出系统键盘的控件,通过下面这个属性可以控制弹出键盘的样式: self.priceTextField.keyboardType = UIKeyboar ...
- 「HNOI2003」消防局的设立
题目 [内存限制:$256 MiB$][时间限制:$1000 ms$] [标准输入输出][题目类型:传统][评测方式:文本比较] 题目描述 2020 年,人类在火星上建立了一个庞大的基地群,总共有 $ ...
- SVN commit,update用法
https://blog.csdn.net/studyvcmfc/article/details/4528896
- 关于length、length()、size()
length:属性,数组的属性. length(): String的方法,方法体里面是 return value.length; size():集合如list.set.map的方法,返回元素个数.
- Vue 项目开发
目录 Vue 项目开发 项目目录结构解析 入口文件 main.js (项目入口) 根组件 app.vue index.html 文件入口 router 路由 components 子组件 项目初始化 ...
- 如何给Sqlite添加复合主键
如果是想两个字段组成一个复合主键的话可以如下.SQL code sqlite> create table t2 ( ...> id1 int , ...> id2 int, ...& ...
- 三、多线程基础-自旋_AQS_多线程上下文
1. 自旋理解 很多synchronized里面的代码只是一些很简单的代码,执行时间非常快,此时等待的线程都加锁可能是一种不太值得的操作,因为线程阻塞涉及到用户态和内核态切换的问题.既然sync ...
- java Spring整合Freemarker的详细步骤
java Spring整合Freemarker的详细步骤 作者: 字体:[增加 减小] 类型:转载 时间:2013-11-14我要评论 本文对Spring整合Freemarker步骤做了详细的说明,按 ...
- [Struts]Token 使用及原理
Struts Token 使用 1,先在一个Action中,调用saveToken(HttpServletRequest request)方法.然后转向带有表单的JSP页面. 2,在JSP页面提交 ...
- hdu 1533 Going Home 最小费用最大流 (模板题)
Going Home Time Limit: 10000/5000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)Total ...