Linear regression with one variable - Cost function intuition I

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第二章《单变量线性回归》中第8课时《代价函数的直观认识 - 1》的视频原文字幕。为本人在视频学习过程中逐字逐句记录下来以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助。

In the previous video (article), we gave the mathematical definition of the cost function. In this video (article), let's look at some examples to get back to intuition about what the cost function is doing, and why we want to use it.

To recap, here's what we had last time. We want to fit a straight line to our data, so we had this formed as a hypothesis with these parameters $\theta _{0}$ and $\theta _{1}$ , and with different choices of the parameters, we end up with different straight-line fits. So, the data which are fit like so. And there's a cost function, and that was our optimization objective. For this video (article), in order to better visualize the cost function J, I'm going to work with a simplified hypothesis function, like that shown on the right. So, I'm gonna use my simplified hypothesis which is just $\theta _{1}x$ . We can, if you want, think of this as setting the parameter $\theta _{0}=0$ . So, I have only one parameter $\theta _{1}$ , and my cost function is similar to before except that now $h_{\theta }(x)=\theta _{1}x$ . And I have only one parameter $\theta _{1}$ , and so my optimization objective is to minimize $J(\theta _{1})$ . In pictures, what this means is that if $\theta _{0}=0$ that corresponds to choosing only hypothesis functions that pass through the origin, that pass through the point $(0, 0)$ . Using the simplified definition of hypothesis cost function, let's try to understand the cost function concept better.

It turns out that two key functions we want to understand. The first is the hypothesis function, and the second is the cost function. So, notice that the hypothesis, right, $h_{\theta }(x)$ . For a fixed value of $\theta _{1}$ , this is a function of x. So, the hypothesis is a function of what is the size of the house x. In contrast, the cost function J, that's a function of the parameter $\theta _{1}$ which controls the slope of the straight line. Let's plot these functions and try to understand them both better. Let's start with the hypothesis. On the left, let's say here's my training set with three points at $(1, 1), (2, 2) and (3, 3)$ . Let's pick a value $\theta _{1}$ , so when set $\theta _{1}$ =1, and if that's my choice for $\theta _{1}$ , then my hypothesis is going to look like this straight line over here. And I'm gonna point out when I'm plotting my hypothesis function, my X-axis, my horizontal axis is labeled x, is labeled as you know, size of the house over here. Now, of temporary, set $\theta _{1}=1$ . What I want to do is figure out what is $J(\theta _{1})$ when $\theta _{1}$ =1. So, let's go ahead and compute what the cost function has for the value one. Well, as usual, my cost function is defined as follows, right? Sum from some of them are my training set of this usual squared error term. And this is therefore equal to $\theta _{1}x^{(i)}-y^{(i)}$ , and if you simplify, this turns out to be $1/2m(0^{2} + 0^{2} + 0^{2})$ , which is of course, just equal to 0. Now, inside the cost function, it turns out, each of these terms here is equal to 0. Because for the specific training set I have, for my 3 training examples there, $(1, 1), (2, 2) and (3, 3)$ , if $\theta _{1}=1$ , then $h_{\theta }(x^{(i)})=y^{(i)}$ exactly. And so, $h_{\theta }(x^{(i)})-y^{(i)}$ , each of these terms is equal to 0, which is why I find that $J(1)=0$ . Let's plot that. What I'm gonna do on the right is plot my cost function J. And notice, because my cost function is a function of my parameter $\theta _{1}$ , when I plot my cost function, the horizontal axis is now labeled with $\theta _{1}$ . So, I have $J(1)=0$ , so let's go ahead and plot that. End up with an X over there. Now let's look at some other examples. $\theta _{1}$ can take on a range of different values. Right? So $\theta _{1}$ can take on the negative values, zero and positive values. So, what if $\theta _{1}=0.5$ ? Let's go ahead and plot that.

I'm now going to set $\theta _{1}=0.5$ , and in that case, my hypothesis looks like this. As a line with slope equals to 0.5. And, let's compute $J(0.5)$ . So, that is going to be $1/2m$ of my usual cost function. It turns out that the cost function is going to be the sum of square values of the height of this line, plus the sum of square of the height of that line, plus the sum of square of the height of that line, right? Because just this vertical distance, that's the difference between $y^{(i)}$ and the predicted value $h_{\theta }(x^{(i)})$ . So, the first example is going to be $(0.5*1-1)^{2}$ . For my second example, I get $(0.5*2-2)^{2}$ , because my hypothesis predicted one, but the actual housing price was two. And finally, plus $(0.5*3-3)^{2}$ . And so that's equal to $1/(2*3)(3.5)\approx 0.58$ . So now we know $J(0.5)$ is about 0.58. Let's go and plot that. So, we plot that which is maybe about over there. Now, let's do one more. How about if $\theta _{1}=0$ , what is $J(0)$ equal to?

It turns out if $\theta _{1}=0$ , $h_{\theta }(x)$ is just equal to 0, you know, this flat line, that just goes horizontally like his. And so, measuring the errors. We have that just $J(0)=1/(2*m)*(1^{2}+2^{2}+3^{2})=1/6*14\approx 2.3$ . So, let's go ahead and plot that as well. So, it ends up with a value around 2.3. And of course, we can keep on doing this for other values of $\theta _{1}$ . It turns out that you can have negative for other values of $\theta _{1}$ as well. So if $\theta _{1}$ is negative, then $h\theta (x)$ would be equal to say $h_{\theta }(x)=-0.5x$ , then $\theta _{1}=-0.5$ , and so that corresponds to a hypothesis with a slope of -0.5. And you can actually keep on computing these errors. This turns out to be, you know, for -0.5, it turns out to have really high error. It works out to be something, like, 5.25 and so on. And for different values of $\theta _{1}$ , you can compute these things. And it turns out that you computed range of values, you get something like that. And by computing the range of values, you can actually slowly create out what this function $J(\theta )$ looks like. And that's what $J(\theta )$ is. To recap, for each value of $\theta _{1}$ , right? Each value of $\theta _{1}$ corresponds to a different hypothesis, or to a different straight line fit on the left. And for each value of $\theta _{1}$ , we could then derive a different a different value of $J(\theta _{1})$ . And for example, $\theta _{1}=1$ corresponds to this straight line (in cyan) straight through the data. Whereas $\theta _{1}=0.5$ , and this point shown in magenta, corresponded to maybe that line (in magenta). And $\theta _{1}=0$ , which is shown in blue, that corresponds to this horizontal line (in blue). So, for each value of $\theta _{1}$ , we wound up with a different value of $J(\theta _{1})$ . And then we could use this to trace out this plot on the right. Now you remember the optimization objective for our learning algorithm is we want to choose the value of $\theta _{1}$ , that minimizes $J(\theta _{1})$ . This ( $\overset{minimize}{\theta _{1}} J(\theta _{1})$ ) was our objective function for the linear regression. Well, looking at this curve, the value that minimizes $J(\theta _{1})$ is $\theta _{1}=1$ . And low and behold, that is indeed the best possible straight line fit throughout data, by setting $\theta _{1}=1$ . And just for this particular training set, we actually end up fitting it perfectly. And that's why minimizing $J(\theta _{1})$ corresponds to finding a straight line that fits the data well. So, to wrap up, in this video (article), we looked at some plots to understand the cost function. To do so, we simplified the algorithm, so that it only had one parameter $\theta _{1}$ . And we set the parameter $\theta _{0}=0$ . In the next video (article), we'll go back to the original problem formulation, and look at some visualizations involving both $\theta _{0}$ and $\theta _{1}$ . That is without setting $\theta _{0}=0$ . And hopefully that will give you an even better sense of what the cost function J is doing in the original linear regression.

Linear regression with one variable - Cost function intuition I的更多相关文章

Linear regression with one variable - Cost function
摘要: 本文是吴恩达 (Andrew Ng)老师<机器学习>课程,第二章<单变量线性回归>中第7课时<代价函数>的视频原文字幕.为本人在视频学习过程中逐字逐句记录下 ...
Lecture0 -- Introduction&&Linear Regression with One Variable
Introduction What is machine learning? Tom Mitchell provides a more modern definition: "A compu ...
Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable
原文:http://blog.csdn.net/abcjennifer/article/details/7700772 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...
Stanford机器学习---第一讲. Linear Regression with one variable
原文:http://blog.csdn.net/abcjennifer/article/details/7691571 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...
机器学习笔记1——Linear Regression with One Variable
Linear Regression with One Variable Model Representation Recall that in *regression problems*, we ar ...
Machine Learning 学习笔记2 - linear regression with one variable(单变量线性回归)
一.Model representation(模型表示) 1.1 训练集由训练样例(training example)组成的集合就是训练集(training set), 如下图所示, 其中(x,y) ...
MachineLearning ---- lesson 2 Linear Regression with One Variable
Linear Regression with One Variable model Representation 以上篇博文中的房价预测为例,从图中依次来看,m表示训练集的大小,此处即房价样本数量:x ...
机器学习 (一) 单变量线性回归 Linear Regression with One Variable
文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang的个人笔 ...
machine learning (2)-linear regression with one variable
machine learning- linear regression with one variable(2) Linear regression with one variable = univa ...

随机推荐

lvs+keepalived做高可用方案1
本文我们主要讲解的是LVS通过keepalived来实现负载均衡和高可用,而不是我们第三篇文章介绍的通过手动的方式来进行配置.通过脚本的方式来显示RS节点的健康检查和LVS的故障切换.此文会通过一个实 ...
X509
1 打开iis 找到部署的站点应用连接池,高级设置,“加载用户配置文件”项的值改为true 2 用户:certmgr.msc 本地计算机:mmc——文件——证书 --CN = QALINE001.on ...
021_STM32程序移植之_ESP8266连接onenet
本次教程是使用STM32C8T6通过ESP8266-12F模块将数据传输到ONENET云端去,并且云端能够下发命令给单片机来实现云端控制.本次实验硬件设备:STM32C8T6最小系统,ESP8266- ...
poj 2762
Tarjan + TopsortTarjan 缩点Topsort 判断 Topsort 判断:在DAG中若初始状态下存在多于1个入度为0的点则说明这些入度为0的点之间不会有路径可达若不存在入度为0的 ...
bzoj3694
/* * 对于不在最短路树上的边(x, y) * 1 * | * | * t * / \ * / \ * x-----y * 考虑这样一种形态的图, ‘-’ 标记为非最短路树的边 * 对于边集(x, ...
P3368 【模板】树状数组 2
原题链接 https://www.luogu.org/problemnew/show/P3368 这个题和洛谷P3374树状数组1 有些不同,在普通的树状数组上运用了差分的知识.(由于P3374涉及 ...
openpyxl -用于读/写Excel 2010 XLSX/XLSM文件的python库
openpyxl -用于读/写Excel 2010 XLSX/XLSM文件的python库¶ https://www.osgeo.cn/openpyxl/index.html
pandas入门之DataFrame
创建DataFrame - DataFrame是一个[表格型]的数据结构.DataFrame由按一定顺序排列的多列数据组成.设计初衷是将Series的使用场景从一维拓展到多维.DataFrame既有行 ...
masm for windows2015 下载安装
下载地址: https://sm.myapp.com/original/Office/wasm2015.rar
codeforces gym #101987K -TV ShowGame（2-SAT）
题目链接: https://codeforces.com/gym/101987 题意: 有长度为$n$的只包含$B,R$的字符串有m种关系,每个关系说出三个位置的确切字符这三个位置的字符最多有一个 ...

Linear regression with one variable - Cost function intuition I

Linear regression with one variable - Cost function intuition I的更多相关文章

随机推荐

热门专题