5种方法推导Normal Equation
引言:
Normal Equation 是最基础的最小二乘方法。在Andrew Ng的课程中给出了矩阵推到形式,本文将重点提供几种推导方式以便于全方位帮助Machine Learning用户学习。
Notations:
RSS(Residual Sum Squared error):残差平方和
β:参数列向量
X:N×p 矩阵,每行是输入的样本向量
y:标签列向量,即目标列向量
Method 1. 向量投影在特征纬度(Vector Projection onto the Column Space)
是一种最直观的理解: The optimization of linear regression is equivalent to finding the projection of vector y onto the column space of X. As the projection is denoted by Xβ, the optimal configuration of β is when the error vector y−Xβis orthogonal to the column space of X, that is
XT(y−Xβ)=0.(1)
Solving this gives:
β=(XTX)−1XTy.
Method 2. Direct Matrix Differentiation
通过重写S(β)为简单形式是一种最简明的方法
S(β)=(y−Xβ)T(y−Xβ)=yTy−βTXTy−yTXβ+βTXTXβ=yTy−2βTXTy+βTXTXβ.
差异化 S(β) w.r.t. β:
−2yTX+βT(XTX+(XTX)T)=−2yTX+2βTXTX=0,
Solving S(β) gives:
β=(XTX)−1XTy.
Method 3. Matrix Differentiation with Chain-rule
This is the simplest method for a lazy person, as it takes very little effort to reach the solution. The key is to apply the chain-rule:
∂S(β)∂β=∂(y−Xβ)T(y−Xβ)∂(y−Xβ)∂(y−Xβ)∂β=−2(y−Xβ)TX=0,
solving S(β) gives:
β=(XTX)−1XTy.
This method requires an understanding of matrix differentiation of the quadratic form: ∂xTWx∂x=xT(W+WT).
Method 4. Without Matrix Differentiation
We can rewrite S(β) as following:
S(β)=⟨β,β⟩−2⟨β,(XTX)−1XTy⟩+⟨(XTX)−1XTy,(XTX)−1XTy⟩+C,
where ⟨⋅,⋅⟩ is the inner product defined by
⟨x,y⟩=xT(XTX)y.
The idea is to rewrite S(β) into the form of S(β)=(x−a)2+b such that x can be solved exactly.
Method 5. Statistical Learning Theory
An alternative method to derive the normal equation arises from the statistical learning theory. The aim of this task is to minimize the expected prediction error given by:
EPE(β)=∫(y−xTβ)Pr(dx,dy),
where x stands for a column vector of random variables, y denotes the target random variable, and β denotes a column vector of parameters (Note the definitions are different from the notations before).
Differentiating EPE(β) w.r.t. β gives:
∂EPE(β)∂β=∫2(y−xTβ)(−1)xTPr(dx,dy).
Before we proceed, let’s check the dimensions to make sure the partial derivative is correct. EPE is the expected error: a 1×1 vector. β is a column vector that is N×1. According to the Jacobian in vector calculus, the resulting partial derivative should take the form
∂EPE∂β=(∂EPE∂β1,∂EPE∂β2,…,∂EPE∂βN),
which is a 1×N vector. Looking back at the right-hand side of the equation above, we find 2(y−xTβ)(−1) being a constant while xTbeing a row vector, resuling the same 1×Ndimension. Thus, we conclude the above partial derivative is correct. This derivative mirrors the relationship between the expected error and the way to adjust parameters so as to reduce the error. To understand why, imagine 2(y−xTβ)(−1) being the errors incurred by the current parameter configurations β and xT being the values of the input attributes. The resulting derivative equals to the error times the scales of each input attribute. Another way to make this point is: the contribution of error from each parameter βi has a monotonic relationship with the error 2(y−xTβ)(−1) as well as the scalar xT that was multiplied to each βi.
Now, let’s go back to the derivation. Because 2(y−xTβ)(−1) is 1×1, we can rewrite it with its transpose:
∂EPE(β)∂β=∫2(y−xTβ)T(−1)xTPr(dx,dy).
Solving ∂EPE(β)∂β=0 gives:
E[yTxT−βTxxT]=0E[βTxxT]=E[yTxT]E[xxTβ]=E[xy]β=E[xxT]−1E[xy].
5种方法推导Normal Equation的更多相关文章
- 机器学习入门:Linear Regression与Normal Equation -2017年8月23日22:11:50
本文会讲到: (1)另一种线性回归方法:Normal Equation: (2)Gradient Descent与Normal Equation的优缺点: 前面我们通过Gradient Desce ...
- 正规方程 Normal Equation
正规方程 Normal Equation 前几篇博客介绍了一些梯度下降的有用技巧,特征缩放(详见http://blog.csdn.net/u012328159/article/details/5103 ...
- machine learning (7)---normal equation相对于gradient descent而言求解linear regression问题的另一种方式
Normal equation: 一种用来linear regression问题的求解Θ的方法,另一种可以是gradient descent 仅适用于linear regression问题的求解,对其 ...
- coursera机器学习笔记-多元线性回归,normal equation
#对coursera上Andrew Ng老师开的机器学习课程的笔记和心得: #注:此笔记是我自己认为本节课里比较重要.难理解或容易忘记的内容并做了些补充,并非是课堂详细笔记和要点: #标记为<补 ...
- Normal Equation
一.Normal Equation 我们知道梯度下降在求解最优参数\(\theta\)过程中需要合适的\(\alpha\),并且需要进行多次迭代,那么有没有经过简单的数学计算就得到参数\(\theta ...
- Normal Equation Algorithm
和梯度下降法一样,Normal Equation(正规方程法)算法也是一种线性回归算法(Linear Regression Algorithm).与梯度下降法通过一步步计算来逐步靠近最佳θ值不同,No ...
- normal equation(正规方程)
normal equation(正规方程) 正规方程是通过求解下面的方程来找出使得代价函数最小的参数的: \[ \frac{\partial}{\partial\theta_j}J\left(\the ...
- YbSoftwareFactory 代码生成插件【二十五】:Razor视图中以全局方式调用后台方法输出页面代码的三种方法
上一篇介绍了 MVC中实现动态自定义路由 的实现,本篇将介绍Razor视图中以全局方式调用后台方法输出页面代码的三种方法. 框架最新的升级实现了一个页面部件功能,其实就是通过后台方法查询数据库内容,把 ...
- 去除inline-block元素间间距的N种方法
这篇文章发布于 2012年04月24日,星期二,22:38,归类于 css相关. 阅读 147771 次, 今日 52 次 by zhangxinxu from http://www.zhangxin ...
随机推荐
- Asp.net mvc 知多少(六)
本系列主要翻译自<ASP.NET MVC Interview Questions and Answers >- By Shailendra Chauhan,想看英文原版的可访问http:/ ...
- js小功能合集:计算指定时间距今多久、评论树核心代码、字符串替换和去除。
1.计算指定时间距今多久 var date1=new Date('2017/02/08 17:00'); //开始时间 var date2=new Date(); //当前时间 var date3=d ...
- Tried to obtain the web lock from a thread other than the main thread or the web thread. This may be
有些操作只能回到主线程操作 比如: mbprogresshud只能在主线程中使用 而且注意凡是关于布局的代码也只能下载主线程
- Swift 面向对象解析(二)
接着上面一篇说的内容: 一 继承: 苹果继承与水果,苹果是水果的子类,则苹果是一种特殊的水果:这就是继承的关系,这个我们学OC的时候相信也都理解了,就不再描述定义了,下面的就叫继承: class ZX ...
- 模仿jquery的fileupload插件
仅需要new一个对象,将上传后台的url和点击触发上传的元素id传给对象,就可以自从实现上传 暂不支持IE <html> <body> <a href="#&q ...
- JS打开摄像头并截图上传
直入正题,JS打开摄像头并截图上传至后端的一个完整步骤 1. 打开摄像头主要用到getUserMedia方法,然后将获取到的媒体流置入video标签 2. 截取图片主要用到canvas绘图,使用dra ...
- C语言总结
我们用了20天的时间左右的时间来学习简单的C语言,对于C语言,总体来说我是学的不是特别透彻,感觉自己什么都不懂一样.明天就要考试了,希望明天考个好成绩,为明年打下一个良好的基础. 这段时间我们学习了: ...
- 用php做省份的三级联动 附带数据库
可以把它做成小插件的形式,以后需要,可以随时调 来看一下怎么来做 先来写个div然后,再引入js包 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1 ...
- Java设计模式之《组合模式》及应用场景
摘要: 原创作品,可以转载,但是请标注出处地址http://www.cnblogs.com/V1haoge/p/6489827.html 组合模式,就是在一个对象中包含其他对象,这些被包含的对象可能是 ...
- 考试模块 - ERP数据流
快速链接:人力资源知识体系索引 本主题主要列出考试中涉及到的所有表. 步骤 操作 相关表 说明 1 考试辅助资料 基础资料,见附表1 2 题库(109130) HXExmGp 3 试题 HXE ...