I was going through the Coursera "Machine Learning" course, and in the section on multivariate linear regression something caught my eye. Andrew Ng presented the Normal Equation as an analytical solution to the linear regression problem with a least-squares cost function. He mentioned that in some cases (such as for small feature sets) using it is more effective than applying gradient descent; unfortunately, he left its derivation out.

Here I want to show how the normal equation is derived.

First, some terminology. The following symbols are compatible with the machine learning course, not with the exposition of the normal equation on Wikipedia and other sites - semantically it's all the same, just the symbols are different.

Given the hypothesis function:

We'd like to minimize the least-squares cost:

Where is the i-th sample (from a set of m samples) and is the i-th expected result.

To proceed, we'll represent the problem in matrix notation; this is natural, since we essentially have a system of linear equations here. The regression coefficients we're looking for are the vector:

Each of the m input samples is similarly a column vector with n+1 rows, being 1 for convenience. So we can now rewrite the hypothesis function as:

When this is summed over all samples, we can dip further into matrix notation. We'll define the "design matrix" X (uppercase X) as a matrix of m rows, in which each row is the i-th sample (the vector ). With this, we can rewrite the least-squares cost as following, replacing the explicit sum by matrix multiplication:

Now, using some matrix transpose identities, we can simplify this a bit. I'll throw the part away since we're going to compare a derivative to zero anyway:

Note that is a vector, and so is y. So when we multiply one by another, it doesn't matter what the order is (as long as the dimensions work out). So we can further simplify:

Recall that here is our unknown. To find where the above function has a minimum, we will derive by and compare to 0. Deriving by a vector may feel uncomfortable, but there's nothing to worry about. Recall that here we only use matrix notation to conveniently represent a system of linear formulae. So we derive by each component of the vector, and then combine the resulting derivatives into a vector again. The result is:

Or:

[Update 27-May-2015: I've written another post that explains in more detail how these derivatives are computed.]

Now, assuming that the matrix is invertible, we can multiply both sides by and get:

Which is the normal equation.

【转】Derivation of the Normal Equation for linear regression的更多相关文章

  1. (三)用Normal Equation拟合Liner Regression模型

    继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...

  2. CS229 3.用Normal Equation拟合Liner Regression模型

    继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...

  3. Linear regression with multiple variables(多特征的线型回归)算法实例_梯度下降解法(Gradient DesentMulti)以及正规方程解法(Normal Equation)

    ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, , ...

  4. machine learning (7)---normal equation相对于gradient descent而言求解linear regression问题的另一种方式

    Normal equation: 一种用来linear regression问题的求解Θ的方法,另一种可以是gradient descent 仅适用于linear regression问题的求解,对其 ...

  5. 机器学习入门:Linear Regression与Normal Equation -2017年8月23日22:11:50

    本文会讲到: (1)另一种线性回归方法:Normal Equation: (2)Gradient Descent与Normal Equation的优缺点:   前面我们通过Gradient Desce ...

  6. 5种方法推导Normal Equation

    引言: Normal Equation 是最基础的最小二乘方法.在Andrew Ng的课程中给出了矩阵推到形式,本文将重点提供几种推导方式以便于全方位帮助Machine Learning用户学习. N ...

  7. Normal Equation Algorithm

    和梯度下降法一样,Normal Equation(正规方程法)算法也是一种线性回归算法(Linear Regression Algorithm).与梯度下降法通过一步步计算来逐步靠近最佳θ值不同,No ...

  8. coursera机器学习笔记-多元线性回归,normal equation

    #对coursera上Andrew Ng老师开的机器学习课程的笔记和心得: #注:此笔记是我自己认为本节课里比较重要.难理解或容易忘记的内容并做了些补充,并非是课堂详细笔记和要点: #标记为<补 ...

  9. Normal Equation

    一.Normal Equation 我们知道梯度下降在求解最优参数\(\theta\)过程中需要合适的\(\alpha\),并且需要进行多次迭代,那么有没有经过简单的数学计算就得到参数\(\theta ...

随机推荐

  1. Android(java)学习笔记221:开发一个多界面的应用程序之不同界面间互相传递数据(短信助手案例)

    1.首先我们看看下面这个需求: 这里我们在A界面上,点击这个按钮"选择要发送的短信",开启B界面上获取网络上各种短信祝福语,然后B界面会把这些网络祝福语短信发送给A界面到" ...

  2. python摇骰子猜大小的小游戏

    #小游戏,摇筛子押大小的小游戏玩家初始有1000块钱,可以压大压小作为赌注 import random #定义摇筛子的函数: def roll_dice(number = 3,points = Non ...

  3. Java 图片与byte数组互相转换

    //图片到byte数组 public byte[] image2byte(String path){ byte[] data = null; FileImageInputStream input = ...

  4. Sql语句 不支持中文 国外数据库

    由于老美的不支持中文 SQL 语句第一:字段类型改为nvarchar,ntext 第二:强制转化 N update dbo.Role set rolename=N'普通用户' update dbo.T ...

  5. 创建dblink遇到一系列问题

    创建dblink遇到一系列问题,有时间 把问题整理一下

  6. PHP MySQL 预处理语句

    PHP MySQL 预处理语句 预处理语句对于防止 MySQL 注入是非常有用的. 预处理语句及绑定参数 预处理语句用于执行多个相同的 SQL 语句,并且执行效率更高. 预处理语句的工作原理如下: 预 ...

  7. Java学习----方法的重载

    一个类中有多个同名的参数不一样的方法. 作用:可以根据不同的条件调用不同的方法. 注意:java不会因为方法的返回类型或者权限的不同而判断为不同的两个方法. public class Student ...

  8. C# Word

    C# 操作word文档 1.c#操作word 在指定书签插入文字或者图片  1using Word = Microsoft.Office.Interop.Word; 2 3object Nothing ...

  9. 面向对象设计模式之TemplateMethod模板方法(行为型)

    动机:在软件构建过程中,对于某一项任务,他常常有稳定的整体操作结构,但各个子步骤却有很多改变的需求,或者由于固有的原因(比如框架与应用之间的关系)而无法和任务的整体结构同时实现:如何在确定稳定操作结构 ...

  10. js 刷新页面大全

    一.先来看一个简单的例子: 下面以三个页面分别命名为frame.html.top.html.bottom.html为例来具体说明如何做. frame.html 由上(top.html)下(bottom ...