[Machine Learning][BP]The Vectorized Back Propagation Algorithm
Reference: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf
I spent nearly one hour to deduce the vector form of the back propagation. Just in case that I may forget, but need to utilize them, I will write down all the formula here to make a backup.
Structure:
Standard BP Network with $\displaystyle \lambda$ hidden layers, one input layer and one output layer.
Activation function: sigmoid.
Notations:
$\displaystyle W^{i+1,i}$, denotes the weight matrix connecting from $i$th layer to $i+1$th layer.
$\displaystyle N^i$, denotes the net input of the $i$th layer.
$\displaystyle A^i$, denotes the activation input of the $i$th layer.
$\displaystyle \delta ^i$, denotes the error of the $i$th layer.
$\displaystyle \epsilon$, denotes the learning rate.
*, stands for element by element multiplication.
(omit), stands for matrix multiplication.
Specifically,
$\displaystyle X$, denotes the input layer, while equals $\displaystyle A^0$.
$\displaystyle A^{\lambda + 1}$, denotes the output layer.
$\displaystyle Y$, denotes the expected output.
Propagations:
Forward:
$\displaystyle N^i = W^{i,i-1}A^{i-1}$.
$\displaystyle A^i = \frac{1}{1+e^{-N^i}}$.
Backward:
$\displaystyle \Delta W^{i+1,i} = \epsilon \delta^{i+1}(A^{i})^{T}$.
$\displaystyle \delta ^i = ((\delta^{i+1})^{T}W^{i+1,i})^{T}*A^{i}*(1-A^{i})$.
$\displaystyle \delta ^{\lambda + 1} = (Y - A^{\lambda + 1})*A^{\lambda + 1}*(1-A^{\lambda + 1})$.
Deduction:
I am not capable of taking the partial derivative of vector or matrix over vector or matrix, so I derive these formulas by observing the formula for each element in the matrix and extend it to the vector form.
$\displaystyle \Delta W^{\lambda+1,\lambda}_{i,j} = \epsilon (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)A^{\lambda}_j$.
Let's assume $\displaystyle \delta ^{\lambda+1}_{i} := (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)$.
$\displaystyle \Delta W^{\lambda,\lambda-1}_{i,j}=\epsilon (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})A_j^{\lambda-1}$.
Let's assume $\displaystyle \delta ^{\lambda}_{i} := (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})$.
The left are reserved for the readers to complete.
[Machine Learning][BP]The Vectorized Back Propagation Algorithm的更多相关文章
- CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)
1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...
- Bayesian machine learning
from: http://www.metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse( ...
- 机器学习算法之旅A Tour of Machine Learning Algorithms
In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...
- [GPU] Machine Learning on C++
一.MPI为何物? 初步了解:MPI集群环境搭建 二.重新认识Spark 链接:https://www.zhihu.com/question/48743915/answer/115738668 马铁大 ...
- A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning
A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on S ...
- Machine Learning—Mixtures of Gaussians and the EM algorithm
印象笔记同步分享:Machine Learning-Mixtures of Gaussians and the EM algorithm
- AUTOML --- Machine Learning for Automated Algorithm Design.
自动算法的机器学习: Machine Learning for Automated Algorithm Design. http://www.ml4aad.org/ AutoML——降低机器学习门槛的 ...
- (转)Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning
Introduction Optimization is always the ultimate goal whether you are dealing with a real life probl ...
- machine learning model(algorithm model) .vs. statistical model
https://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/ http: ...
随机推荐
- 利用TPL(任务并行库)构建Pipeline处理Dataflow
https://www.cnblogs.com/CoderAyu/p/9757389.html
- 九 SpringMvc与json交互
将json输出到页面: 1 加入jar包 2 配置Controller层,开启注解ResponseBody,将json发送到页面: 3 访问url 4 响应json,在形参列表里面加上注解
- PaperWeek1
------------------5月13日星期一---------------------------- 论文:Practical Locally Private Heavy Hitters 看了 ...
- 5G将重新定义物联网和边缘计算
导读 比上一代蜂窝服务(4G)相比,5G提供的无线蜂窝连接性具有更高的带宽.更低的延迟和更高的设备密度. 比上一代蜂窝服务(4G)相比,5G提供的无线蜂窝连接性具有更高的带宽.更低的延迟和更高的设备密 ...
- 【FastDev4Android框架开发】RecyclerView完全解析之下拉刷新与上拉加载SwipeRefreshLayout(三十一)
转载请标明出处: http://blog.csdn.net/developer_jiangqq/article/details/49992269 本文出自:[江清清的博客] (一).前言: [好消息] ...
- Windows篇:文件对比软件->"DiffMerge"
文件对比软件->"DiffMerge" DiffMerge是什么? 如果没有DiffMerge! 想想一下,有两篇10000字的文章,找不同,眼睛都要看花吧.有了DiffMe ...
- PLSQL Developer配置Oralce11g连接
您的位置:首页 → 资讯教程 → 编程开发 → PLSQL Developer配置Oralce11g连接 PLSQL Developer配置Oralce11g连接 时间:2015/2/3 8:36:2 ...
- struts2令牌(token)内部原理
小菜最近接触了struts2中的令牌知识,由于该知识点比较重要,因此想弄明白些,于是满怀信心的上网查阅资料,结果让小菜很无奈,网上的资料千篇一律,总结出来就一句话:“访问页面时,在页面产生一个to ...
- Tomcat+JSP经典配置实例
经常看到jsp的初学者问tomcat下如何配置jsp.servlet和bean的问题,于是总结了一下如何tomcat下配置jsp.servlet和ben,希望对那些初学者有所帮助. 一.开发环境配置 ...
- NO24 第三关--企业面试题
[考试目的] 1.学生课后复习及预习情况. 2.未来实际工作中做人做事能力. 3.沟通及口头表达能力. [口头表达技能考试题] 1.描述linux的开机到登陆界面的启动过程(记时2分钟) *****L ...