Lecture0 -- Introduction&&Linear Regression with One Variable
Introduction
What is machine learning?
Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.
In general, any machine learning problem can be assigned to one of two broad classifications:
Supervised learning and Unsupervised learning.
Supervised Learning
In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.
Supervised learning problems are categorized into "regression" and "classification" problems.
Regression-> a continuous output
Classification-> a discrete output
Example:
Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.
We
could turn this example into a classification problem by instead
making our output about whether the house "sells for more or
less than the asking price." Here we are classifying the houses
based on price into two discrete categories.
Unsupervised Learning
Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.
We can derive this structure by clustering the data based on relationships among the variables in the data.
With unsupervised learning there is no feedback based on the prediction results.
Example:
Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment.
Linear Regression with One Variable
Model Representation
X(i)-> “input” variables, also called input features.
Y(i)->“output” or target variable
(X(i),Y(i)) -> a training example
the superscript “(i)” -> an index into the training set, and has nothing to do with exponentiation.
X-> the space of input values
Y-> the space of output values.
m-> the number of samples
function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y.
When the target variable that we’re trying to predict is continuous,we call the learning problem a regression problem. When y can take on only a small number of discrete values, we call it a classification problem.
Cost Function
We can measure the accuracy of our hypothesis function by using a cost function.
This function is otherwise called the "Squared error function", or "Mean squared error(MSE)".
Cost Function - Intuition I
Our objective is to get the best possible line. The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least. Ideally, the line should pass through all the points of our training data set. In such a case, the value of J(θ0,θ1) will be 0.
When θ1=1, we get a slope of 1 which goes through every single data point in our model. Conversely, when θ1=0.5, we see the vertical distance from our fit to the data points increase.
This increases our cost function to 0.58. Plotting several other points yields to the following graph:
Thus as a goal, we should try to minimize the cost function. In this case, θ1=1 is our global minimum.
Cost Function - Intuition II
A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.
Taking any color and going along the 'circle', one would expect to get the same value of the cost function. For example, the three green points found on the green line above have the same value for J(θ0,θ1) and as a result, they are found along the same line. The circled x displays the value of the cost function for the graph on the left when θ0 = 800 and θ1= -0.15. Taking another h(x) and plotting its contour plot, one gets the following graphs:
When θ0 = 360 and θ1 = 0, the value of J(θ0,θ1) in the contour plot gets closer to the center thus reducing the cost function error. Now giving our hypothesis function a slightly positive slope results in a better fit of the data.
The graph above minimizes the cost function as much as possible and consequently, the result of θ1 and θ0 tend to be around 0.12 and 250 respectively. Plotting those values on our graph to the right seems to put our point in the center of the inner most 'circle'.
Gradient Descent
Now we need to estimate the parameters in the hypothesis function. That's where gradient descent comes in.
The gradient descent algorithm is:
repeat until convergence:
where
j=0,1 represents the feature index number.
At each iteration j, one should simultaneously update the parameters θ1,θ2,...,θn.
The size of each step is determined by the parameter α, which is called the learning rate.
Gradient Descent Intuition
The following graph shows that when the slope is negative, the value of θ1 increases and when it is positive, the value of θ1 decreases.
On a side note, we should adjust our parameter α to ensure that the gradient descent algorithm converges in a reasonable time.
How does gradient descent converge with a fixed step size α? approaches 0 as we approach the bottom of our convex function. At the minimum, the derivative will always be 0 and thus we get:
Gradient Descent For Linear Regression
This method looks at every example in the entire training set on every step, and is called batch gradient descent.
where m is the size of the training set, θ0 a constant that will be changing simultaneously with θ1 and xi, yi are values of the given training set (data).
The following is a derivation of for a single example :
Lecture0 -- Introduction&&Linear Regression with One Variable的更多相关文章
- Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable
原文:http://blog.csdn.net/abcjennifer/article/details/7700772 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...
- Stanford机器学习---第一讲. Linear Regression with one variable
原文:http://blog.csdn.net/abcjennifer/article/details/7691571 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...
- 机器学习笔记1——Linear Regression with One Variable
Linear Regression with One Variable Model Representation Recall that in *regression problems*, we ar ...
- Machine Learning 学习笔记2 - linear regression with one variable(单变量线性回归)
一.Model representation(模型表示) 1.1 训练集 由训练样例(training example)组成的集合就是训练集(training set), 如下图所示, 其中(x,y) ...
- Ng第二课:单变量线性回归(Linear Regression with One Variable)
二.单变量线性回归(Linear Regression with One Variable) 2.1 模型表示 2.2 代价函数 2.3 代价函数的直观理解 2.4 梯度下降 2.5 梯度下 ...
- 【cs229-Lecture2】Linear Regression with One Variable (Week 1)(含测试数据和源码)
从Ⅱ到Ⅳ都在讲的是线性回归,其中第Ⅱ章讲得是简单线性回归(simple linear regression, SLR)(单变量),第Ⅲ章讲的是线代基础,第Ⅳ章讲的是多元回归(大于一个自变量). 本文的 ...
- MachineLearning ---- lesson 2 Linear Regression with One Variable
Linear Regression with One Variable model Representation 以上篇博文中的房价预测为例,从图中依次来看,m表示训练集的大小,此处即房价样本数量:x ...
- 斯坦福第二课:单变量线性回归(Linear Regression with One Variable)
二.单变量线性回归(Linear Regression with One Variable) 2.1 模型表示 2.2 代价函数 2.3 代价函数的直观理解 I 2.4 代价函数的直观理解 I ...
- 机器学习 (一) 单变量线性回归 Linear Regression with One Variable
文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang的个人笔 ...
随机推荐
- 用 Sencha Touch 构建移动 web 应用程序
Sencha Touch 是一个使用 HTML5.CSS3 和 JavaScript 语言构建的移动 web 应用程序框架,在本文中,学习如何应用您当前的 web 开发技能进行移动 web 开发.下载 ...
- CUDA编程->CUDA入门了解(一)
安装好CUDA6.5+VS2012,操作系统为Win8.1版本号,首先下个GPU-Z检測了一下: 看出本显卡属于中低端配置.关键看两个: Shaders=384.也称作SM.或者说core/流处理器数 ...
- 用算法求N(N>=3)之内素数的个数
首先.我们谈一下素数的定义.什么是素数?除了1和它本身外,不能被其它自然数整除(除0以外)的数 称之为素数(质数):否则称为合数. 依据素数的定义,在解决问题上,一開始我想到的方法是从3到N之间每一个 ...
- 【Web探索之旅】第三部分第一课:server
wx_fmt=jpeg" alt="0? wx_fmt=jpeg" style="height:auto"> 内容简单介绍 .第三部分第一课:s ...
- 关于myeclipse缓存问题
昨天在部署项目的时候发现,刚检出的项目jdk竟然不是1.6版本的了,而是新的(在此之前每次检出项目都会重新设置一下jdk),如图所示: 当时还在想,真好,以后就不用配置了,但是随之而来的是一个重大问题 ...
- spring IOC(转)
原文 http://stamen.iteye.com/blog/1489223 引述:IoC(控制反转:Inverse of Control)是Spring容器的内核,AOP.声明式事务等功能在此基础 ...
- holiday和vacation的区别
holiday:假日vacation:假期a.对于英国人或者澳大利亚人来说,“假日”的意思等同于“假期”(尽管他们很少用“假期”)b.如果你是美国人,“假日”是指一个特殊的日子,好像圣诞节,而“假期” ...
- 查询所有联系人并选中显示 contentprovider
<!-- 读取联系人记录的权限 --> <uses-permission android:name="android.permission.READ_CONTACTS&qu ...
- PAT 天梯赛 L1-054. 福到了 【字符串】
题目链接 https://www.patest.cn/contests/gplt/L1-054 思路 可以先将字符串用字符串数组 输入 然后用另一个字符串数组 从 n - 1 -> 0 保存 其 ...
- CSS动画硬件加速
http://zencode.in/14.CSS%E5%8A%A8%E7%94%BB%E7%9A%84%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96.html http:// ...