I was going through the Coursera "Machine Learning" course, and in the section on multivariate linear regression something caught my eye. Andrew Ng presented the Normal Equation as an analytical solution to the linear regression problem with a least-squares cost function. He mentioned that in some cases (such as for small feature sets) using it is more effective than applying gradient descent; unfortunately, he left its derivation out.

Here I want to show how the normal equation is derived.

First, some terminology. The following symbols are compatible with the machine learning course, not with the exposition of the normal equation on Wikipedia and other sites - semantically it's all the same, just the symbols are different.

Given the hypothesis function:

We'd like to minimize the least-squares cost:

Where is the i-th sample (from a set of m samples) and is the i-th expected result.

To proceed, we'll represent the problem in matrix notation; this is natural, since we essentially have a system of linear equations here. The regression coefficients we're looking for are the vector:

Each of the m input samples is similarly a column vector with n+1 rows, being 1 for convenience. So we can now rewrite the hypothesis function as:

When this is summed over all samples, we can dip further into matrix notation. We'll define the "design matrix" X (uppercase X) as a matrix of m rows, in which each row is the i-th sample (the vector ). With this, we can rewrite the least-squares cost as following, replacing the explicit sum by matrix multiplication:

Now, using some matrix transpose identities, we can simplify this a bit. I'll throw the part away since we're going to compare a derivative to zero anyway:

Note that is a vector, and so is y. So when we multiply one by another, it doesn't matter what the order is (as long as the dimensions work out). So we can further simplify:

Recall that here is our unknown. To find where the above function has a minimum, we will derive by and compare to 0. Deriving by a vector may feel uncomfortable, but there's nothing to worry about. Recall that here we only use matrix notation to conveniently represent a system of linear formulae. So we derive by each component of the vector, and then combine the resulting derivatives into a vector again. The result is:

Or:

[Update 27-May-2015: I've written another post that explains in more detail how these derivatives are computed.]

Now, assuming that the matrix is invertible, we can multiply both sides by and get:

Which is the normal equation.

【转】Derivation of the Normal Equation for linear regression的更多相关文章

  1. (三)用Normal Equation拟合Liner Regression模型

    继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...

  2. CS229 3.用Normal Equation拟合Liner Regression模型

    继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...

  3. Linear regression with multiple variables(多特征的线型回归)算法实例_梯度下降解法(Gradient DesentMulti)以及正规方程解法(Normal Equation)

    ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, , ...

  4. machine learning (7)---normal equation相对于gradient descent而言求解linear regression问题的另一种方式

    Normal equation: 一种用来linear regression问题的求解Θ的方法,另一种可以是gradient descent 仅适用于linear regression问题的求解,对其 ...

  5. 机器学习入门:Linear Regression与Normal Equation -2017年8月23日22:11:50

    本文会讲到: (1)另一种线性回归方法:Normal Equation: (2)Gradient Descent与Normal Equation的优缺点:   前面我们通过Gradient Desce ...

  6. 5种方法推导Normal Equation

    引言: Normal Equation 是最基础的最小二乘方法.在Andrew Ng的课程中给出了矩阵推到形式,本文将重点提供几种推导方式以便于全方位帮助Machine Learning用户学习. N ...

  7. Normal Equation Algorithm

    和梯度下降法一样,Normal Equation(正规方程法)算法也是一种线性回归算法(Linear Regression Algorithm).与梯度下降法通过一步步计算来逐步靠近最佳θ值不同,No ...

  8. coursera机器学习笔记-多元线性回归,normal equation

    #对coursera上Andrew Ng老师开的机器学习课程的笔记和心得: #注:此笔记是我自己认为本节课里比较重要.难理解或容易忘记的内容并做了些补充,并非是课堂详细笔记和要点: #标记为<补 ...

  9. Normal Equation

    一.Normal Equation 我们知道梯度下降在求解最优参数\(\theta\)过程中需要合适的\(\alpha\),并且需要进行多次迭代,那么有没有经过简单的数学计算就得到参数\(\theta ...

随机推荐

  1. jsp页面禁用缓存

    问题:为什么禁用JSP页面缓存 就是为了得到实时信息 怎样禁用JSP页面缓存 1.在JSP页面设置 <meta http-equiv="pragma" content=&qu ...

  2. SpringMVC中文乱码

    刚刚构建的SpringMVC项目,一般都是中文乱码的. 这时的工程就是一个JSP页面的事情,可以添加如下代码 <%@ page language="java" import= ...

  3. C#解leetcode 152. Maximum Product Subarray

    Find the contiguous subarray within an array (containing at least one number) which has the largest ...

  4. C#学习第三天

    经过这几天的学习,真的有点觉得以前C学的太不好现在学C#也不顺畅,虽然很多东西都似曾相识,但是就是还得看好几遍才能记得住,而且现在都是简单的东西,还没有看到重载等稍微难点的地方.应该好好努力了,昨天忙 ...

  5. JSP学习--常用作用域

    page:当前页面,也就是只要跳到别的页面就失效了 request:一次会话,简单的理解就是一次请求范围内有效 session:浏览器进程,只要当前页面没有被关闭(没有被程序强制清除),不管怎么跳转都 ...

  6. JavaScript设计模式之观察者模式(学习笔记)

    设计模式(Design Pattern)对于软件开发来说其重要性不言而喻,代码可复用.可维护.可扩展一直都是软件工程中的追求!对于我一个学javascript的人来说,理解设计模式似乎有些困难,对仅切 ...

  7. KMP算法总结

    kmp算法的T子字符串的下标的变化规律 大话数据结构这边书中的KMP算法的讲解跟最终的算法代码还是有很大的差别 java语言只会if判断语句,循环语句,但是这些语句以及可以包罗万象了,可以适用很多情况 ...

  8. 一个PHP开发者总结的九条建议

    本文只是个人从实际开发经验中总结的一些东西,并不是什么名言警句,写出来有两个目的:一是时刻提醒自己要按照这些知识点来写自己代码,二是为了分享,说不定对你有用呢?万一,是吧... 1.首要意识:安全 大 ...

  9. 【转】教你爱上Blocks(闭包)

    Block 与传统代码相比较更加轻量,调用简洁方便,而且可以返回多个参数,使用Block可以让代码更加具有易读性,而我们在写回调时,也可以直接写在函数内部,而不用再去写一个回调函数 Block 闭包 ...

  10. cas sso单点登录系列3_cas-server端配置认证方式实践(数据源+自定义java类认证)

    转:http://blog.csdn.net/ae6623/article/details/8851801 本篇将讲解cas-server端的认证方式 1.最简单的认证,用户名和密码一致就登录成功 2 ...