Model Representation

To establish notation for future use, we’ll use x(i) to denote the “input” variables (living area in this example), also called input features, and y(i) to denote the “output” or target variable that we are trying to predict (price). A pair (x(i),y(i)) is called a training example, and the dataset that we’ll be using to learn—a list of m training examples (x(i),y(i));i=1,...,m—is called a training set. Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = ℝ.

To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:

When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.

Cost Function

We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.

J(θ0,θ1)=12m∑i=1m(y^i−yi)2=12m∑i=1m(hθ(xi)−yi)2

To break it apart, it is 12 x¯ where x¯ is the mean of the squares of hθ(xi)−yi , or the difference between the predicted value and the actual value.

This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved (12)as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 12 term. The following image summarizes what the cost function does:

Cost Function - Intuition I

If we try to think of it in visual terms, our training data set is scattered on the x-y plane. We are trying to make a straight line (defined by hθ(x)) which passes through these scattered data points.

Our objective is to get the best possible line. The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least. Ideally, the line should pass through all the points of our training data set. In such a case, the value of J(θ0,θ1) will be 0. The following example shows the ideal situation where we have a cost function of 0.

When θ1=1, we get a slope of 1 which goes through every single data point in our model. Conversely, when θ1=0.5, we see the vertical distance from our fit to the data points increase.

This increases our cost function to 0.58. Plotting several other points yields to the following graph:

Thus as a goal, we should try to minimize the cost function. In this case, θ1=1 is our global minimum.

Cost Function - Intuition II

A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the cost function. For example, the three green points found on the green line above have the same value for J(θ0,θ1) and as a result, they are found along the same line. The circled x displays the value of the cost function for the graph on the left when θ0 = 800 and θ1= -0.15. Taking another h(x) and plotting its contour plot, one gets the following graphs:

When θ0 = 360 and θ1 = 0, the value of J(θ0,θ1) in the contour plot gets closer to the center thus reducing the cost function error. Now giving our hypothesis function a slightly positive slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the result of θ1 and θ0 tend to be around 0.12 and 250 respectively. Plotting those values on our graph to the right seems to put our point in the center of the inner most 'circle'.

Model Representation and Cost Function的更多相关文章

  1. Linear regression with one variable - Cost function

    摘要: 本文是吴恩达 (Andrew Ng)老师<机器学习>课程,第二章<单变量线性回归>中第7课时<代价函数>的视频原文字幕.为本人在视频学习过程中逐字逐句记录下 ...

  2. loss function与cost function

    实际上,代价函数(cost function)和损失函数(loss function 亦称为 error function)是同义的.它们都是事先定义一个假设函数(hypothesis),通过训练集由 ...

  3. 【caffe】loss function、cost function和error

    @tags: caffe 机器学习 在机器学习(暂时限定有监督学习)中,常见的算法大都可以划分为两个部分来理解它 一个是它的Hypothesis function,也就是你用一个函数f,来拟合任意一个 ...

  4. 逻辑回归损失函数(cost function)

    逻辑回归模型预估的是样本属于某个分类的概率,其损失函数(Cost Function)可以像线型回归那样,以均方差来表示:也可以用对数.概率等方法.损失函数本质上是衡量”模型预估值“到“实际值”的距离, ...

  5. [Machine Learning] 浅谈LR算法的Cost Function

    了解LR的同学们都知道,LR采用了最小化交叉熵或者最大化似然估计函数来作为Cost Function,那有个很有意思的问题来了,为什么我们不用更加简单熟悉的最小化平方误差函数(MSE)呢? 我个人理解 ...

  6. logistic回归具体解释(二):损失函数(cost function)具体解释

    有监督学习 机器学习分为有监督学习,无监督学习,半监督学习.强化学习.对于逻辑回归来说,就是一种典型的有监督学习. 既然是有监督学习,训练集自然能够用例如以下方式表述: {(x1,y1),(x2,y2 ...

  7. 机器学习 损失函数(Loss/Error Function)、代价函数(Cost Function)和目标函数(Objective function)

    损失函数(Loss/Error Function): 计算单个训练集的误差,例如:欧氏距离,交叉熵,对比损失,合页损失 代价函数(Cost Function): 计算整个训练集所有损失之和的平均值 至 ...

  8. 损失函数(Loss function) 和 代价函数(Cost function)

    1损失函数和代价函数的区别: 损失函数(Loss function):指单个训练样本进行预测的结果与实际结果的误差. 代价函数(Cost function):整个训练集,所有样本误差总和(所有损失函数 ...

  9. 手势跟踪论文学习:Realtime and Robust Hand Tracking from Depth(三)Cost Function

    iker原创.转载请标明出处:http://blog.csdn.net/ikerpeng/article/details/39050619 Realtime and Robust Hand Track ...

随机推荐

  1. Web 项目更改项目名

    简单的记录web开发中基本的操作. 更改项目名 直接修改 找到原项目中的.project 文件,更改中项目名称.然后在同目录下找到.mymetadata 文件 并更改name.context-root ...

  2. [AHOI2001]彩票摇奖

    [AHOI2001]彩票摇奖 题目描述 为了丰富人民群众的生活.支持某些社会公益事业,北塔市设置了一 项彩票.该彩票的规则是: (1) 每张彩票上印有 7 个各不相同的号码,且这些号码的取指范围为 1 ...

  3. C语言bitmap的使用技巧

    bitmap是一种以位的状态来表示某种特性的状态的一种操作方式,类似嵌入式中的寄存器操作,在表示某种特性enable/disable的时候很适用且占用的内存空间也很小 比如:做过交换机或者企业网管,路 ...

  4. 我的python学习笔记一

    我的python学习笔记,快速了解python,适合有C语言基础的. http://note.youdao.com/noteshare?id=93b9750a8950c6303467cf33cb1ba ...

  5. oracle 表查询(一)

    通过scott用户下的表来演示如何使用select语句,接下来对emp.dept.salgrade表结构进行解说. emp 雇员表字段名称   数据类型       是否为空   备注-------- ...

  6. mysql技能提升篇 - Sqlyog高级应用

    mysql作为绝大部分公司使用的数据库,自然是牛牛牛! 每个人都能设计数据库,都能从删库到跑路.但是,如何做到更好,更快,更准地建立你的mysql数据库,这是个值得关注的问题(尽管很多人已经去搞大数据 ...

  7. Bmob云IM实现头像更换并存入Bmob云数据库中(1.拍照替换,2.相册选择)

    看图效果如下: 1.个人资料界面 2.点击头像弹出对话框 3.点击拍照 4.切割图片,选择合适的部分 5.点击保存,头像替换完毕,下面看从相册中选择图片. 6.点击相册 7.任选一张图片 8.切割图片 ...

  8. Linux命令行与脚本编程大全第一章

    1, 2,linux内核:内存管理.进程管理.文件管理.设备管理. 其中内存管理如下图: 通过命令 cat/proc/meminfo查看系统的内存状态.通过ipcs查看共享内存.信号量.消息队列信息. ...

  9. 在Ubuntu终端彻底删除软件

    在Ubuntu终端彻底删除软件 1.删除软件 方法一.如果你知道要删除软件的具体名称,可以使用 sudo apt-get remove --purge 软件名称 sudo apt-get autore ...

  10. 编译期类型检查 in ClojureScript

    前言  话说"动态类型一时爽,代码重构火葬场",虽然有很多不同的意见(请参考),但我们看到势头强劲的TypeScript和Flow.js,也能感知到静态类型在某程度上能帮助我们写出 ...