摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第二章《单变量线性回归》中第7课时《代价函数》的视频原文字幕。为本人在视频学习过程中逐字逐句记录下来以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助。

In this video (article), we'll define something called the cost function. This will let us figure out how to fit the best possible straight line to our data.

In linear regression we have a training set like that shown here. Remember our notation M was the number of training examples, so maybe M=47. And the form of hypothesis, which we use to make prediction, is this linear function. To introduce a little bit more terminology, this  and , these  are what I call the parameters of the model. What we are going to do in this video (article) is talk about how to go about choosing these two parameter values,  and .

With different choices of parameters  and  we get different hypotheses, different hypothesis functions. I know some of you will probably be already familiar with what I'm going to do on this slide, but just to review here are a few examples. If  and , then the hypothesis function will look like this. Right, because your hypothesis function will be , this is flat at 1.5. If  and , then the hypothesis will look like this. And this should pass through this point (2,1), says you now have  which looks like that. And if  and , then we end up with the hypothesis that looks like this. Let's see, it should pass through the  point like so. And this is my new . All right, well you remember that this is   but as a shorthand, sometimes I just write this as .

​ ​

In linear regression we have a training set like maybe the one I've plotted here. What we want to do is come up with values for the parameters  and . So that the straight line we get out of this corresponds to a straight line that somehow fits the data well. Like maybe the line over there. So how do we come up with values ,  that corresponds to a good fit to the data? The idea is we're going to choose our parameters  and  so that , meaning the value we predict on input x, that is at least close to the values y for the examples in our training set. So, in our training set we're given a number of examples where we know x decides the house and we know the actual price of what it's sold for. So, let's try to choose values for the parameters so that at least in the training set, given the x's in the training set, we make reasonably accurate predictions for the y values. Let's formalize this. So linear regression, what we're going to do is that I'm going to want to solve a minimization problem. So, I'm going to write minimize over , . And, I want this to be small, right, I want the difference between  and y to be small. And one thing I might do is try to minimize the square difference between the output of the hypothesis and the actual price of the house. Okay? So, let's fill in some details. Remember that I was using the notation  to represent the training example. So, what I want really is to sum over my training set. Sum from i to M of the square difference between the prediction of my hypothesis when it is input the size of the house number i, minus the actual price that house number i was sold for and I want to minimize the sum of my training set sum from i equals 1 through M of the difference of this squared error, square difference between the predicted price of the house and the price that was actually sold for. And just remind you of your notation M here was the size of my training set, right, so the M there is my number of training examples, right? That hash sign is the abbreviation for "number" of training examples. Okay? And to make the math a little bit easier, I'm going to actually look at, you know,  times that. So, we're going to try to minimize my average error, which we're going to minimize . Putting the 2, the constant one half, in front it just makes some of the math a little easier. So, minimizing one half of something, right, should give you the same values of the parameters ,  as minimizing that function. And just make sure this equation is clear, right? This expression in here, , this is our usual, right? That's equal to . And, this notation, minimize over  and , this means find me the values of theta zero and theta one that causes this expression to be minimized. And this expression depends on  and . Okay? So just to recap, we're posing this problem as find me the values of  and  so that the average already one over two M times the sum of square errors between my predictions on the training set minus the actual values of the houses on the training set is minimized. So, this is going to be my overall objective function for linear regression. And just to, you know rewrite this out a little bit more cleanly, what I'm going to do by convention is we usually define a cost function. Which is going to be exactly this.  That formula that I have up here. And what I want to do is minimize over  and  my function . Just write this out, this is my cost function. So, this cost function is also called the squared error function or sometimes called the square error cost function and it turns out that why do we take the square of the errors? It turns out the squared error cost function is reasonable choice and will work well for most problems, for most regression problems. There are other cost functions that will work pretty well, but the squared error cost function is probably the most common used one for regression problems. Later in this class we'll also talk about alternative cost functions as well, but this choice that we just had should be a pretty reasonable thing to try for most linear regression problems. Okay, so, that's the cost function. So far we've just seen a mathematical definition of, you know, the cost function and in case this function seems a little bit abstract and you still don't have a good sense of what it's doing, in the next couple of videos (articles) we're actually going to go a little bit deeper into what the cost function J is doing and try to give you better intuition about what it's computing and why we want to use it.

<end>

Linear regression with one variable - Cost function的更多相关文章

  1. Linear regression with one variable - Cost function intuition I

    摘要: 本文是吴恩达 (Andrew Ng)老师<机器学习>课程,第二章<单变量线性回归>中第8课时<代价函数的直观认识 - 1>的视频原文字幕.为本人在视频学习过 ...

  2. Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7700772 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  3. Stanford机器学习---第一讲. Linear Regression with one variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7691571 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  4. 机器学习笔记1——Linear Regression with One Variable

    Linear Regression with One Variable Model Representation Recall that in *regression problems*, we ar ...

  5. Machine Learning 学习笔记2 - linear regression with one variable(单变量线性回归)

    一.Model representation(模型表示) 1.1 训练集 由训练样例(training example)组成的集合就是训练集(training set), 如下图所示, 其中(x,y) ...

  6. MachineLearning ---- lesson 2 Linear Regression with One Variable

    Linear Regression with One Variable model Representation 以上篇博文中的房价预测为例,从图中依次来看,m表示训练集的大小,此处即房价样本数量:x ...

  7. 机器学习 (一) 单变量线性回归 Linear Regression with One Variable

    文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang的个人笔 ...

  8. Lecture0 -- Introduction&&Linear Regression with One Variable

    Introduction What is machine learning? Tom Mitchell provides a more modern definition: "A compu ...

  9. machine learning (2)-linear regression with one variable

    machine learning- linear regression with one variable(2) Linear regression with one variable = univa ...

随机推荐

  1. python中的函数、生成器的工作原理

    1.python中函数的工作原理 def foo(): bar() def bar(): pass python的解释器,也就是python.exe(c编写)会用PyEval_EvalFramEx(c ...

  2. ModelAndView返回Json格式的数据

    第一种方式: 1.自定义类JacksonUtil.java,类中实现tojson方法(即将数据转成json类型): 2.自定义类JsonView 继承 AbstractView 3.xml中配置bea ...

  3. XML建模实列

    XML建模 建模的由来: 就是将指定的xml字符串当作对象来操作           好处在于,只需要调用指定的方法就可以完成预定的字符串获取: 建模的一个思路: 1.分析需要被建模的文件中有那几个对 ...

  4. 使用JSP/Servalet技术开发新闻发布系统------动态网页开发基础

    什么是动态网页? 动态网页是指在服务器端运行的程序或者网页,它们会随不同客户.不同时间,返回不同的网页. 动态网页的特点? (1).交互性:即网页会根据用户的要求和选择而动态改变和响应.采用动态网页技 ...

  5. Linux 内存分配失败(关于overcommit_memory)

    1.问题现象和分析:测试时发现当系统中空闲内存还有很多时,就报内存分配失败了,所有进程都报内存分配失败:sshd@localhost:/var/log>free             tota ...

  6. [CSP-S模拟测试]:跳房子(模拟)

    题目描述 跳房子,是一种世界性的儿童游戏,也是中国民间传统的体育游戏之一.跳房子是在$N$个格子上进行的,$CYJ$对游戏进行了改进,该成了跳棋盘,改进后的游戏是在一个$N$行$M$列的棋盘上进行,并 ...

  7. d3.js之树形折叠树

    1.效果 children和_children 2.技术分解 2.1折叠函数 // (1) 递归调用,有子孙的就把children(显示)给_children(不显示)暂存,便于折叠, functio ...

  8. javaSE集合---进度2

    一.集合框架 1.特点 对象封装数据,对象多了也需要存储,集合用于存储对象. 对象的个数确定可以使用数组,但是不确定的话,可以用集合,因为集合是可变长度的. 2.集合和数组的区别 数组是固定长度的,集 ...

  9. docker tcp配置

    1. Ubuntu Docker deamon监听tcp端口设置 https://www.jianshu.com/p/e278b0e44e1b 2. Centos https://www.cnblog ...

  10. Logback 输出 JPA SQL日志 到文件

    Logback 输出 JPA SQL日志 到文件 使用Spring Boot 配置 JPA 时可以指定如下配置在控制台查看执行的SQL语句 spring.jpa.show-sql=true Sprin ...