【转】Derivation of the Normal Equation for linear regression
I was going through the Coursera "Machine Learning" course, and in the section on multivariate linear regression something caught my eye. Andrew Ng presented the Normal Equation as an analytical solution to the linear regression problem with a least-squares cost function. He mentioned that in some cases (such as for small feature sets) using it is more effective than applying gradient descent; unfortunately, he left its derivation out.
Here I want to show how the normal equation is derived.
First, some terminology. The following symbols are compatible with the machine learning course, not with the exposition of the normal equation on Wikipedia and other sites - semantically it's all the same, just the symbols are different.
Given the hypothesis function:

We'd like to minimize the least-squares cost:

Where
is the i-th sample (from a set of m samples) and
is the i-th expected result.
To proceed, we'll represent the problem in matrix notation; this is natural, since we essentially have a system of linear equations here. The regression coefficients
we're looking for are the vector:

Each of the m input samples is similarly a column vector with n+1 rows,
being 1 for convenience. So we can now rewrite the hypothesis function as:

When this is summed over all samples, we can dip further into matrix notation. We'll define the "design matrix" X (uppercase X) as a matrix of m rows, in which each row is the i-th sample (the vector
). With this, we can rewrite the least-squares cost as following, replacing the explicit sum by matrix multiplication:

Now, using some matrix transpose identities, we can simplify this a bit. I'll throw the
part away since we're going to compare a derivative to zero anyway:

Note that
is a vector, and so is y. So when we multiply one by another, it doesn't matter what the order is (as long as the dimensions work out). So we can further simplify:

Recall that here
is our unknown. To find where the above function has a minimum, we will derive by
and compare to 0. Deriving by a vector may feel uncomfortable, but there's nothing to worry about. Recall that here we only use matrix notation to conveniently represent a system of linear formulae. So we derive by each component of the vector, and then combine the resulting derivatives into a vector again. The result is:

Or:

[Update 27-May-2015: I've written another post that explains in more detail how these derivatives are computed.]
Now, assuming that the matrix
is invertible, we can multiply both sides by
and get:

Which is the normal equation.
【转】Derivation of the Normal Equation for linear regression的更多相关文章
- (三)用Normal Equation拟合Liner Regression模型
继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...
- CS229 3.用Normal Equation拟合Liner Regression模型
继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...
- Linear regression with multiple variables(多特征的线型回归)算法实例_梯度下降解法(Gradient DesentMulti)以及正规方程解法(Normal Equation)
,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, , ...
- machine learning (7)---normal equation相对于gradient descent而言求解linear regression问题的另一种方式
Normal equation: 一种用来linear regression问题的求解Θ的方法,另一种可以是gradient descent 仅适用于linear regression问题的求解,对其 ...
- 机器学习入门:Linear Regression与Normal Equation -2017年8月23日22:11:50
本文会讲到: (1)另一种线性回归方法:Normal Equation: (2)Gradient Descent与Normal Equation的优缺点: 前面我们通过Gradient Desce ...
- 5种方法推导Normal Equation
引言: Normal Equation 是最基础的最小二乘方法.在Andrew Ng的课程中给出了矩阵推到形式,本文将重点提供几种推导方式以便于全方位帮助Machine Learning用户学习. N ...
- Normal Equation Algorithm
和梯度下降法一样,Normal Equation(正规方程法)算法也是一种线性回归算法(Linear Regression Algorithm).与梯度下降法通过一步步计算来逐步靠近最佳θ值不同,No ...
- coursera机器学习笔记-多元线性回归,normal equation
#对coursera上Andrew Ng老师开的机器学习课程的笔记和心得: #注:此笔记是我自己认为本节课里比较重要.难理解或容易忘记的内容并做了些补充,并非是课堂详细笔记和要点: #标记为<补 ...
- Normal Equation
一.Normal Equation 我们知道梯度下降在求解最优参数\(\theta\)过程中需要合适的\(\alpha\),并且需要进行多次迭代,那么有没有经过简单的数学计算就得到参数\(\theta ...
随机推荐
- vim 学习相关记录
VIM 相关内容****************** vim 的三个模式: 编辑模式 --> 输入模式 --> 末行模式 编辑模式: 通常键入键盘值被理解成一个操作; 如: dd(删除行) ...
- AsyncTask和Handler的对比
AsyncTask和Handler对比 1 ) AsyncTask实现的原理,和适用的优缺点 AsyncTask,是android提供的轻量级的异步类,可以直接继承AsyncTask,在类中实现异步操 ...
- 用CSS3实现带小三角形的div框(不用图片)
现在看到了很多带小三角形的方框,如微信.Mac版的QQ.QQ空间的时间轴等等,在聊天或者是发表的状态的内容外面都有一个带小三角形的矩形框包围着,感觉看着很不错,于是决定亲自动手写一个,我上次用的是偏移 ...
- vedeo与audio标签的使用
浏览器原生支持音视频无疑是一件大事——尤其对移动设备而言.不依赖Flash,意味着更加省电.安全和快速的播放体验,而且只需要引入一个标签,就能播放自如. <video src="dao ...
- css.day04
1. box 盒子模型 <p> <span> <hr/> <div> css+ div p span css+ xhtml b ...
- Mysql 中is null 和 =null 的区别
在mysql中,筛选非空的时候经常会用到is not null和!=null,这两种方法单从字面上来看感觉是差不多的,其实如 果去运行一下试试的话差别会很大! 为什么会出现这种情况呢? null 表示 ...
- Deep Learning学习随记(一)稀疏自编码器
最近开始看Deep Learning,随手记点,方便以后查看. 主要参考资料是Stanford 教授 Andrew Ng 的 Deep Learning 教程讲义:http://deeplearnin ...
- IDEA中添加各种依赖pom.xml文件内容
刚实习的小白,今天准备进入项目,纳尼,前辈把框架什么的都搭建好了,默默的抹了一把辛酸泪,刚刚接触自学框架的时候,添加依赖的时候总是各种问题,让前辈发给我之后,才发现人家写的代码相当优美了.下面就是前辈 ...
- 46 Permutations(全排列Medium)
题目意思:全排列 思路:其实看这题目意思,是不太希望用递归的,不过还是用了递归,非递归的以后再搞吧 ps:vector这玩意不能随便返回,开始递归方法用vector,直接到500ms,换成void,到 ...
- 我牵头,你做事——C#委托实践
我牵头,你做事——C#委托实践一 2007-09-05 23:54:54 标签:委托 原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 .作者信息和本声明.否则将追究法律责任.http ...