[Machine Learning][BP]The Vectorized Back Propagation Algorithm

RaymondJiang 2024-10-08 13:52:54 原文

　　Reference: https://www.cs.swarthmore.edu/~meeden/cs81/s10/BackPropDeriv.pdf

　　I spent nearly one hour to deduce the vector form of the back propagation. Just in case that I may forget, but need to utilize them, I will write down all the formula here to make a backup.

Structure:

　　Standard BP Network with $\displaystyle \lambda$ hidden layers, one input layer and one output layer.

　　Activation function: sigmoid.

Notations:

$\displaystyle W^{i+1,i}$, denotes the weight matrix connecting from $i$th layer to $i+1$th layer.

$\displaystyle N^i$, denotes the net input of the $i$th layer.

$\displaystyle A^i$, denotes the activation input of the $i$th layer.

$\displaystyle \delta ^i$, denotes the error of the $i$th layer.

$\displaystyle \epsilon$, denotes the learning rate.

*, stands for element by element multiplication.

(omit), stands for matrix multiplication.

　　Specifically,

$\displaystyle X$, denotes the input layer, while equals $\displaystyle A^0$.

$\displaystyle A^{\lambda + 1}$, denotes the output layer.

$\displaystyle Y$, denotes the expected output.

Propagations:

　　Forward:

$\displaystyle N^i = W^{i,i-1}A^{i-1}$.

$\displaystyle A^i = \frac{1}{1+e^{-N^i}}$.

　　Backward:

$\displaystyle \Delta W^{i+1,i} = \epsilon \delta^{i+1}(A^{i})^{T}$.

$\displaystyle \delta ^i = ((\delta^{i+1})^{T}W^{i+1,i})^{T}*A^{i}*(1-A^{i})$.

$\displaystyle \delta ^{\lambda + 1} = (Y - A^{\lambda + 1})*A^{\lambda + 1}*(1-A^{\lambda + 1})$.

Deduction:

　　I am not capable of taking the partial derivative of vector or matrix over vector or matrix, so I derive these formulas by observing the formula for each element in the matrix and extend it to the vector form.

$\displaystyle \Delta W^{\lambda+1,\lambda}_{i,j} = \epsilon (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)A^{\lambda}_j$.

　　Let's assume $\displaystyle \delta ^{\lambda+1}_{i} := (Y_i - A^{\lambda+1}_i)A^{\lambda+1}_i(1-A^{\lambda +1}_i)$.

$\displaystyle \Delta W^{\lambda,\lambda-1}_{i,j}=\epsilon (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})A_j^{\lambda-1}$.

　　Let's assume $\displaystyle \delta ^{\lambda}_{i} := (\delta^{\lambda+1})^{T}W^{\lambda+1,\lambda}_{col(i)}A_i^{\lambda}(1-A_i^{\lambda})$.

　　The left are reserved for the readers to complete.

[Machine Learning][BP]The Vectorized Back Propagation Algorithm的更多相关文章

CheeseZH: Stanford University: Machine Learning Ex4:Training Neural Network(Backpropagation Algorithm)
1. Feedforward and cost function; 2.Regularized cost function: 3.Sigmoid gradient The gradient for t ...
Bayesian machine learning
from: http://www.metacademy.org/roadmaps/rgrosse/bayesian_machine_learning Created by: Roger Grosse( ...
机器学习算法之旅A Tour of Machine Learning Algorithms
In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...
[GPU] Machine Learning on C++
一.MPI为何物? 初步了解:MPI集群环境搭建二.重新认识Spark 链接:https://www.zhihu.com/question/48743915/answer/115738668 马铁大 ...
A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning
A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on S ...
Machine Learning—Mixtures of Gaussians and the EM algorithm
印象笔记同步分享:Machine Learning-Mixtures of Gaussians and the EM algorithm
AUTOML --- Machine Learning for Automated Algorithm Design.
自动算法的机器学习: Machine Learning for Automated Algorithm Design. http://www.ml4aad.org/ AutoML——降低机器学习门槛的 ...
（转）Introduction to Gradient Descent Algorithm (along with variants) in Machine Learning
Introduction Optimization is always the ultimate goal whether you are dealing with a real life probl ...
machine learning model(algorithm model) .vs. statistical model
https://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/ http: ...

随机推荐

Centos7 使用yum安装MariaDB与MariaDB的简单配置与使用
一.mariadb的安装 MariaDB数据库管理系统是MySQL的一个分支,主要由开源社区在维护,采用GPL授权许可. 开发这个分支的原因之一是:甲骨文公司收购了MySQL后,有将MySQL闭源的潜 ...
本周总结（19年暑假）—— Part2
日期:2019.7.21 博客期:108 星期日这几天正在认真学习大数据,我是在B站上看尚老师的视频搞得.我已经配好了Hadoop的基本环境,现在学习的是HDFS的相关内容
Java 5 、6、 7中新特性
JDK5新特性(与1.4相比)[转] 1 循环 for (type variable : array){ body} for (type variable : arrayList){body} 而1. ...
day16-Python运维开发基础（os / os.path / shutil模块）
1. OS模块与shutil模块 os :新建/删除shutil: 复制/移动 # ### os模块与 shutil模块 """ os 新建/删除 shutil 复制/ ...
MAC Matlab 中文乱码
环境:macOS High Sierra 10.13.4 问题:文件中文注释乱码(再次打开文件时) / 控制台输出中文乱码解决方法: 官网下载补丁(https://ww2.mathworks.cn/ ...
GET乱码以及POST乱码的解决方法
GET乱码以及POST乱码的解决方法作者:东坡下载来源:uzzf 发布时间:2010-10-14 11:40:01 点击: 一.GET乱码的解决方法在tomcat的server.xml文件 ...
NodeJs koa2实现文件上传
[转载自:]https://www.jianshu.com/p/34d0e1a5ac70 知识讲解 koa2框架是一个基于中间件的框架,也就是说,需要使用到的功能,比如路由(koa-router),日 ...
Windows驱动开发-内核常用内存函数
搞内存常用函数 C语言内核 malloc ExAllocatePool memset RtlFillMemory memcpy RtlMoveMemory free ExFreePool
Windows驱动开发-派遣函数格式
NTSTATUS functionName(PDEVICE_OBJECT pDeviceObject, PIRP pIrp) { //业务代码区 //设置返回状态 pIrp->IoStatus. ...
51nod 1163：最高的奖励优先队列
1163 最高的奖励基准时间限制:1 秒空间限制:131072 KB 分值: 20 难度:3级算法题收藏关注有N个任务,每个任务有一个最晚结束时间以及一个对应的奖励.在结束时间之前完成该 ...