Model Representation and Cost Function
Model Representation
To establish notation for future use, we’ll use x(i) to denote the “input” variables (living area in this example), also called input features, and y(i) to denote the “output” or target variable that we are trying to predict (price). A pair (x(i),y(i)) is called a training example, and the dataset that we’ll be using to learn—a list of m training examples (x(i),y(i));i=1,...,m—is called a training set. Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = ℝ.
To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:

When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.
Cost Function
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.
J(θ0,θ1)=12m∑i=1m(y^i−yi)2=12m∑i=1m(hθ(xi)−yi)2
To break it apart, it is 12 x¯ where x¯ is the mean of the squares of hθ(xi)−yi , or the difference between the predicted value and the actual value.
This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved (12)as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 12 term. The following image summarizes what the cost function does:

Cost Function - Intuition I
If we try to think of it in visual terms, our training data set is scattered on the x-y plane. We are trying to make a straight line (defined by hθ(x)) which passes through these scattered data points.
Our objective is to get the best possible line. The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least. Ideally, the line should pass through all the points of our training data set. In such a case, the value of J(θ0,θ1) will be 0. The following example shows the ideal situation where we have a cost function of 0.

When θ1=1, we get a slope of 1 which goes through every single data point in our model. Conversely, when θ1=0.5, we see the vertical distance from our fit to the data points increase.

This increases our cost function to 0.58. Plotting several other points yields to the following graph:

Thus as a goal, we should try to minimize the cost function. In this case, θ1=1 is our global minimum.
Cost Function - Intuition II
A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the cost function. For example, the three green points found on the green line above have the same value for J(θ0,θ1) and as a result, they are found along the same line. The circled x displays the value of the cost function for the graph on the left when θ0 = 800 and θ1= -0.15. Taking another h(x) and plotting its contour plot, one gets the following graphs:

When θ0 = 360 and θ1 = 0, the value of J(θ0,θ1) in the contour plot gets closer to the center thus reducing the cost function error. Now giving our hypothesis function a slightly positive slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the result of θ1 and θ0 tend to be around 0.12 and 250 respectively. Plotting those values on our graph to the right seems to put our point in the center of the inner most 'circle'.
Model Representation and Cost Function的更多相关文章
- Linear regression with one variable - Cost function
摘要: 本文是吴恩达 (Andrew Ng)老师<机器学习>课程,第二章<单变量线性回归>中第7课时<代价函数>的视频原文字幕.为本人在视频学习过程中逐字逐句记录下 ...
- loss function与cost function
实际上,代价函数(cost function)和损失函数(loss function 亦称为 error function)是同义的.它们都是事先定义一个假设函数(hypothesis),通过训练集由 ...
- 【caffe】loss function、cost function和error
@tags: caffe 机器学习 在机器学习(暂时限定有监督学习)中,常见的算法大都可以划分为两个部分来理解它 一个是它的Hypothesis function,也就是你用一个函数f,来拟合任意一个 ...
- 逻辑回归损失函数(cost function)
逻辑回归模型预估的是样本属于某个分类的概率,其损失函数(Cost Function)可以像线型回归那样,以均方差来表示:也可以用对数.概率等方法.损失函数本质上是衡量”模型预估值“到“实际值”的距离, ...
- [Machine Learning] 浅谈LR算法的Cost Function
了解LR的同学们都知道,LR采用了最小化交叉熵或者最大化似然估计函数来作为Cost Function,那有个很有意思的问题来了,为什么我们不用更加简单熟悉的最小化平方误差函数(MSE)呢? 我个人理解 ...
- logistic回归具体解释(二):损失函数(cost function)具体解释
有监督学习 机器学习分为有监督学习,无监督学习,半监督学习.强化学习.对于逻辑回归来说,就是一种典型的有监督学习. 既然是有监督学习,训练集自然能够用例如以下方式表述: {(x1,y1),(x2,y2 ...
- 机器学习 损失函数(Loss/Error Function)、代价函数(Cost Function)和目标函数(Objective function)
损失函数(Loss/Error Function): 计算单个训练集的误差,例如:欧氏距离,交叉熵,对比损失,合页损失 代价函数(Cost Function): 计算整个训练集所有损失之和的平均值 至 ...
- 损失函数(Loss function) 和 代价函数(Cost function)
1损失函数和代价函数的区别: 损失函数(Loss function):指单个训练样本进行预测的结果与实际结果的误差. 代价函数(Cost function):整个训练集,所有样本误差总和(所有损失函数 ...
- 手势跟踪论文学习:Realtime and Robust Hand Tracking from Depth(三)Cost Function
iker原创.转载请标明出处:http://blog.csdn.net/ikerpeng/article/details/39050619 Realtime and Robust Hand Track ...
随机推荐
- java 数组内的最大组合数
给定一个任意长度的java数组,求数组内的数能组合出来的最大整数比如说{9,98,123,32} 最大就是 99832123 import java.util.Arrays; import java. ...
- 浅谈Linux虚拟内存
我的orangepi内存很少,所以我打算给它弄个虚拟内存 首先建立一个1G的空文件: dd if=/dev/zero of=/home/swapfile bs=64M count=16 格式化为swa ...
- 网页端HTML使用MQTTJs订阅RabbitMQ数据
最近在做一个公司的日志组件时有一个问题难住了我.今天问题终于解决了.由于在解决问题中,在网上也查了很多资料都没有一个完整的实例可以参考.所以本着无私分享的目的记录一下完整的解决过程和实例. 需求:做一 ...
- win7系统Myeclipse下切换SVN用户
Eclipse的SVN插件Subclipse做得很好,在svn操作方面提供了很强大丰富的功能.但到目前为止,该插件对svn用户的概念极为淡薄,不但不能方便地切换用户,而且一旦用户的帐号.密码保存之后 ...
- C++移动构造函数以及move语句简单介绍
C++移动构造函数以及move语句简单介绍 首先看一个小例子: #include <iostream> #include <cstring> #include <cstd ...
- hdu1760博弈SG
A New Tetris Game Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others ...
- bzoj1087 [SCOI2005][状压DP] 互不侵犯King (状压)
在N×N的棋盘里面放K个国王,使他们互不攻击,共有多少种摆放方案.国王能攻击到它上下左右,以及左上左下右上右下八个方向上附近的各一个格子,共8个格子. Input 只有一行,包含两个数N,K ( 1 ...
- C#分布式事务解决方案-TransactionScope
引用一下别人的导读:在实际开发工作中,执行一个事件,然后调用另一接口插入数据,如果处理逻辑出现异常,那么之前插入的数据将成为垃圾数据,我们所希望的是能够在整个这个方法定义为一个事务,Transacti ...
- svn服务端安装、权限修改以及客户端的使用
2017-10-1016:10:2 svn服务端安装.权限修改以及客户端的使用 svn服务端.客户端.汉化包下载 http://pan.baidu.com/s/1c1Ogj2C 1.安装服务器端程序( ...
- Cookie同域,跨域单点登录
Cookie 同域单点登录 最近在做一个单点登录的系统整合项目,之前我们使用控件实现单点登录(以后可以介绍一下).但现在为了满足客户需求,在不使用控件情况下实现单点登录,先来介绍一下单点登录. 单点登 ...