Reference: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf

  I spent nearly one hour to deduce the vector form of the back propagation. Just in case that I may forget, but need to utilize them, I will write down all the formula here to make a backup.

Structure:

  Standard BP Network with $\displaystyle \lambda$ hidden layers, one input layer and one output layer.

  Activation function: sigmoid.

Notations:

$\displaystyle W^{i+1,i}$, denotes the weight matrix connecting from $i$th layer to $i+1$th layer.

$\displaystyle N^i$, denotes the net input of the $i$th layer.

$\displaystyle A^i$, denotes the activation input of the $i$th layer.

$\displaystyle \delta ^i$, denotes the error of the $i$th layer.

$\displaystyle \epsilon$, denotes the learning rate.

*, stands for element by element multiplication.

(omit), stands for matrix multiplication.

  Specifically,

$\displaystyle X$, denotes the input layer, while equals $\displaystyle A^0$.

$\displaystyle A^{\lambda + 1}$, denotes the output layer.

$\displaystyle Y$, denotes the expected output.

Propagations:

  Forward:

$\displaystyle N^i = W^{i,i-1}A^{i-1}$.

$\displaystyle A^i = \frac{1}{1+e^{-N^i}}$.

  Backward:

$\displaystyle \Delta W^{i+1,i} = \epsilon \delta^{i+1}(A^{i})^{T}$.

$\displaystyle \delta ^i = ((\delta^{i+1})^{T}W^{i+1,i})^{T}*A^{i}*(1-A^{i})$.

$\displaystyle \delta ^{\lambda + 1} = (Y - A^{\lambda + 1})*A^{\lambda + 1}*(1-A^{\lambda + 1})$.

Deduction:

  I am not capable of taking the partial derivative of vector or matrix over vector or matrix, so I derive these formulas by observing the formula for each element in the matrix and extend it to the vector form.

$\displaystyle \Delta W^{\lambda+1,\lambda}_{i,j} = \epsilon (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)A^{\lambda}_j$.

  Let's assume $\displaystyle \delta ^{\lambda+1}_{i} := (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)$.

$\displaystyle \Delta W^{\lambda,\lambda-1}_{i,j}=\epsilon (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})A_j^{\lambda-1}$.

  Let's assume $\displaystyle \delta ^{\lambda}_{i} := (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})$.

  The left are reserved for the readers to complete.

[Machine Learning][BP]The Vectorized Back Propagation Algorithm的更多相关文章

  1. CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)

    1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...

  2. Bayesian machine learning

    from: http://www.metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse( ...

  3. 机器学习算法之旅A Tour of Machine Learning Algorithms

    In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...

  4. [GPU] Machine Learning on C++

    一.MPI为何物? 初步了解:MPI集群环境搭建 二.重新认识Spark 链接:https://www.zhihu.com/question/48743915/answer/115738668 马铁大 ...

  5. A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning

    A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on S ...

  6. Machine Learning—Mixtures of Gaussians and the EM algorithm

    印象笔记同步分享:Machine Learning-Mixtures of Gaussians and the EM algorithm

  7. AUTOML --- Machine Learning for Automated Algorithm Design.

    自动算法的机器学习: Machine Learning for Automated Algorithm Design. http://www.ml4aad.org/ AutoML——降低机器学习门槛的 ...

  8. (转)Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning

    Introduction Optimization is always the ultimate goal whether you are dealing with a real life probl ...

  9. machine learning model(algorithm model) .vs. statistical model

    https://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/ http: ...

随机推荐

  1. 键盘类型UIKeyboardType

    UITextField.UITextView等能够调出系统键盘的控件,通过下面这个属性可以控制弹出键盘的样式: self.priceTextField.keyboardType = UIKeyboar ...

  2. 「HNOI2003」消防局的设立

    题目 [内存限制:$256 MiB$][时间限制:$1000 ms$] [标准输入输出][题目类型:传统][评测方式:文本比较] 题目描述 2020 年,人类在火星上建立了一个庞大的基地群,总共有 $ ...

  3. SVN commit,update用法

    https://blog.csdn.net/studyvcmfc/article/details/4528896

  4. 关于length、length()、size()

    length:属性,数组的属性. length(): String的方法,方法体里面是  return value.length; size():集合如list.set.map的方法,返回元素个数.

  5. Vue 项目开发

    目录 Vue 项目开发 项目目录结构解析 入口文件 main.js (项目入口) 根组件 app.vue index.html 文件入口 router 路由 components 子组件 项目初始化 ...

  6. 如何给Sqlite添加复合主键

    如果是想两个字段组成一个复合主键的话可以如下.SQL code sqlite> create table t2 ( ...> id1 int , ...> id2 int, ...& ...

  7. 三、多线程基础-自旋_AQS_多线程上下文

    1. 自旋理解    很多synchronized里面的代码只是一些很简单的代码,执行时间非常快,此时等待的线程都加锁可能是一种不太值得的操作,因为线程阻塞涉及到用户态和内核态切换的问题.既然sync ...

  8. java Spring整合Freemarker的详细步骤

    java Spring整合Freemarker的详细步骤 作者: 字体:[增加 减小] 类型:转载 时间:2013-11-14我要评论 本文对Spring整合Freemarker步骤做了详细的说明,按 ...

  9. [Struts]Token 使用及原理

      Struts Token 使用 1,先在一个Action中,调用saveToken(HttpServletRequest request)方法.然后转向带有表单的JSP页面. 2,在JSP页面提交 ...

  10. hdu 1533 Going Home 最小费用最大流 (模板题)

    Going Home Time Limit: 10000/5000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)Total ...