I was going through the Coursera "Machine Learning" course, and in the section on multivariate linear regression something caught my eye. Andrew Ng presented the Normal Equation as an analytical solution to the linear regression problem with a least-squares cost function. He mentioned that in some cases (such as for small feature sets) using it is more effective than applying gradient descent; unfortunately, he left its derivation out.

Here I want to show how the normal equation is derived.

First, some terminology. The following symbols are compatible with the machine learning course, not with the exposition of the normal equation on Wikipedia and other sites - semantically it's all the same, just the symbols are different.

Given the hypothesis function:

We'd like to minimize the least-squares cost:

Where is the i-th sample (from a set of m samples) and is the i-th expected result.

To proceed, we'll represent the problem in matrix notation; this is natural, since we essentially have a system of linear equations here. The regression coefficients we're looking for are the vector:

Each of the m input samples is similarly a column vector with n+1 rows, being 1 for convenience. So we can now rewrite the hypothesis function as:

When this is summed over all samples, we can dip further into matrix notation. We'll define the "design matrix" X (uppercase X) as a matrix of m rows, in which each row is the i-th sample (the vector ). With this, we can rewrite the least-squares cost as following, replacing the explicit sum by matrix multiplication:

Now, using some matrix transpose identities, we can simplify this a bit. I'll throw the part away since we're going to compare a derivative to zero anyway:

Note that is a vector, and so is y. So when we multiply one by another, it doesn't matter what the order is (as long as the dimensions work out). So we can further simplify:

Recall that here is our unknown. To find where the above function has a minimum, we will derive by and compare to 0. Deriving by a vector may feel uncomfortable, but there's nothing to worry about. Recall that here we only use matrix notation to conveniently represent a system of linear formulae. So we derive by each component of the vector, and then combine the resulting derivatives into a vector again. The result is:

Or:

[Update 27-May-2015: I've written another post that explains in more detail how these derivatives are computed.]

Now, assuming that the matrix is invertible, we can multiply both sides by and get:

Which is the normal equation.

【转】Derivation of the Normal Equation for linear regression的更多相关文章

  1. (三)用Normal Equation拟合Liner Regression模型

    继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...

  2. CS229 3.用Normal Equation拟合Liner Regression模型

    继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...

  3. Linear regression with multiple variables(多特征的线型回归)算法实例_梯度下降解法(Gradient DesentMulti)以及正规方程解法(Normal Equation)

    ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, , ...

  4. machine learning (7)---normal equation相对于gradient descent而言求解linear regression问题的另一种方式

    Normal equation: 一种用来linear regression问题的求解Θ的方法,另一种可以是gradient descent 仅适用于linear regression问题的求解,对其 ...

  5. 机器学习入门:Linear Regression与Normal Equation -2017年8月23日22:11:50

    本文会讲到: (1)另一种线性回归方法:Normal Equation: (2)Gradient Descent与Normal Equation的优缺点:   前面我们通过Gradient Desce ...

  6. 5种方法推导Normal Equation

    引言: Normal Equation 是最基础的最小二乘方法.在Andrew Ng的课程中给出了矩阵推到形式,本文将重点提供几种推导方式以便于全方位帮助Machine Learning用户学习. N ...

  7. Normal Equation Algorithm

    和梯度下降法一样,Normal Equation(正规方程法)算法也是一种线性回归算法(Linear Regression Algorithm).与梯度下降法通过一步步计算来逐步靠近最佳θ值不同,No ...

  8. coursera机器学习笔记-多元线性回归,normal equation

    #对coursera上Andrew Ng老师开的机器学习课程的笔记和心得: #注:此笔记是我自己认为本节课里比较重要.难理解或容易忘记的内容并做了些补充,并非是课堂详细笔记和要点: #标记为<补 ...

  9. Normal Equation

    一.Normal Equation 我们知道梯度下降在求解最优参数\(\theta\)过程中需要合适的\(\alpha\),并且需要进行多次迭代,那么有没有经过简单的数学计算就得到参数\(\theta ...

随机推荐

  1. Android(java)学习笔记216:多线程断点下载的原理(Android实现)

    之前在Android(java)学习笔记215中,我们从JavaSE的角度去实现了多线程断点下载,下面从Android角度实现这个断点下载: 1.新建一个Android工程: (1)其中我们先实现布局 ...

  2. 关于Daydream VR的最直白的介绍

    虚拟现实(Virtual Reality),简称虚拟技术,也称虚拟环境,是利用电脑模拟产生一个三度空间的虚拟世界,提供用户关于视觉等感官的模拟,让用户如同身历其境一般,电脑可以立即进行复杂的运算,将精 ...

  3. sublime 3 3083验证码

    Sublime Text 3注册码两枚: ----- BEGIN LICENSE ----- K- Single User License EA7E- 3A099EC1 C0B5C7C5 33EBF0 ...

  4. mysql sql语句大全(2)

    1.说明:创建数据库 CREATE DATABASE database-name 2.说明:删除数据库 drop database dbname 3.说明:备份sql server --- 创建 备份 ...

  5. 使用jq深入研究轮播图特性

    网站轮播图 太耳熟的词了  基本上做pc端的 主页绝壁会来一个轮播图的特效 轮播图他一个页面页面的切换,其实的原理是通过css的定位 ,定位到一起,第一张首先显示,其余默认隐藏. 今天我实现的这个轮播 ...

  6. expdp 备份数据库-附带报错信息

    操作系统层面创建目录 [root@Oracle11g ~]# mkdir -p /home/oracle/db_back/ 修改目录的所属用户.所属组 [root@Oracle11g ~]# chow ...

  7. 【转】block一点也不神秘————如何利用block进行回调

    我们在开发中常常会用到函数回调,你可以用通知来替代回调,但是大多数时候回调是比通知方便的,所以何乐而不为呢?如果你不知道回调使用的场景,我们来假设一下: 1.我现在玩手机 2.突然手机没有电了 3.我 ...

  8. 重新开始学习javase_一切都是对象

    @学习thinking in java 一,一切都是对象 用句柄操纵对象 每种编程语言都有自己的数据处理方式.比如说c与c++中的指针,而java中尽管将一切都“看作”对象,但操纵的标识符实际是指向一 ...

  9. Linux中tar命令-C用法

    最近写了一个项目,其中用到了tar这个命令,发现在Qt中的file取得路径之后,获得的都是绝对路径,这个时候用tar打包会将绝对路径全部放进去, 可以用tar temp.tar.gz file -C ...

  10. 文件操作-php

    <?php /* 建立缓存 可以用文件长时间保存数据 文件是以liunux为模型的 在Windows下只能获取file ,dir unknow linux 下可以获取block char dir ...