Introduction

What is machine learning? 

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:

Supervised learning and Unsupervised learning.

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems.

Regression-> a continuous output

Classification-> a discrete output

Example:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We
could turn this example into a classification problem by instead
making our output about whether the house "sells for more or
less than the asking price." Here we are classifying the houses
based on price into two discrete categories.

Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results.

Example:

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment.

Linear Regression with One Variable

Model Representation

X(i)-> “input” variables, also called input features.

Y(i)->“output” or target variable

(X(i),Y(i)) -> a training example

the superscript “(i)” -> an index into the training set, and has nothing to do with exponentiation.

X-> the space of input values

Y-> the space of output values.

m-> the number of samples

function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y.

When the target variable that we’re trying to predict is continuous,we call the learning problem a regression problem. When y can take on only a small number of discrete values, we call it a classification problem.

Cost Function

We can measure the accuracy of our hypothesis function by using a cost function.

This function is otherwise called the "Squared error function", or "Mean squared error(MSE)".

Cost Function - Intuition I

Our objective is to get the best possible line. The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least. Ideally, the line should pass through all the points of our training data set. In such a case, the value of J(θ01) will be 0.

When θ1=1, we get a slope of 1 which goes through every single data point in our model. Conversely, when θ1=0.5, we see the vertical distance from our fit to the data points increase.

This increases our cost function to 0.58. Plotting several other points yields to the following graph:

Thus as a goal, we should try to minimize the cost function. In this case, θ1=1 is our global minimum.

Cost Function - Intuition II

A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the cost function. For example, the three green points found on the green line above have the same value for J(θ01) and as a result, they are found along the same line. The circled x displays the value of the cost function for the graph on the left when θ0 = 800 and θ1= -0.15. Taking another h(x) and plotting its contour plot, one gets the following graphs:

When θ0 = 360 and θ1 = 0, the value of J(θ01) in the contour plot gets closer to the center thus reducing the cost function error. Now giving our hypothesis function a slightly positive slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the result of θ1 and θ0 tend to be around 0.12 and 250 respectively. Plotting those values on our graph to the right seems to put our point in the center of the inner most 'circle'.

Gradient Descent

Now we need to estimate the parameters in the hypothesis function. That's where gradient descent comes in.

The gradient descent algorithm is:

repeat until convergence:

where

j=0,1 represents the feature index number.

At each iteration j, one should simultaneously update the parameters θ12,...,θn.

The size of each step is determined by the parameter α, which is called the learning rate.

Gradient Descent Intuition

The following graph shows that when the slope is negative, the value of θ1 increases and when it is positive, the value of θ1 decreases.

On a side note, we should adjust our parameter α to ensure that the gradient descent algorithm converges in a reasonable time.

How does gradient descent converge with a fixed step size α? approaches 0 as we approach the bottom of our convex function. At the minimum, the derivative will always be 0 and thus we get:

Gradient Descent For Linear Regression

This method looks at every example in the entire training set on every step, and is called batch gradient descent.

where m is the size of the training set, θ0 a constant that will be changing simultaneously with θ1 and xi, yi are values of the given training set (data).

The following is a derivation of for a single example :








Lecture0 -- Introduction&&Linear Regression with One Variable的更多相关文章

  1. Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7700772 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  2. Stanford机器学习---第一讲. Linear Regression with one variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7691571 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  3. 机器学习笔记1——Linear Regression with One Variable

    Linear Regression with One Variable Model Representation Recall that in *regression problems*, we ar ...

  4. Machine Learning 学习笔记2 - linear regression with one variable(单变量线性回归)

    一.Model representation(模型表示) 1.1 训练集 由训练样例(training example)组成的集合就是训练集(training set), 如下图所示, 其中(x,y) ...

  5. Ng第二课:单变量线性回归(Linear Regression with One Variable)

    二.单变量线性回归(Linear Regression with One Variable) 2.1  模型表示 2.2  代价函数 2.3  代价函数的直观理解 2.4  梯度下降 2.5  梯度下 ...

  6. 【cs229-Lecture2】Linear Regression with One Variable (Week 1)(含测试数据和源码)

    从Ⅱ到Ⅳ都在讲的是线性回归,其中第Ⅱ章讲得是简单线性回归(simple linear regression, SLR)(单变量),第Ⅲ章讲的是线代基础,第Ⅳ章讲的是多元回归(大于一个自变量). 本文的 ...

  7. MachineLearning ---- lesson 2 Linear Regression with One Variable

    Linear Regression with One Variable model Representation 以上篇博文中的房价预测为例,从图中依次来看,m表示训练集的大小,此处即房价样本数量:x ...

  8. 斯坦福第二课:单变量线性回归(Linear Regression with One Variable)

    二.单变量线性回归(Linear Regression with One Variable) 2.1  模型表示 2.2  代价函数 2.3  代价函数的直观理解 I 2.4  代价函数的直观理解 I ...

  9. 机器学习 (一) 单变量线性回归 Linear Regression with One Variable

    文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang的个人笔 ...

随机推荐

  1. python(20)- 列表生成式和生成器表达式练习Ⅱ

    题目一: 有两个列表,分别存放来老男孩报名学习linux和python课程的学生名字linux=['钢弹','小壁虎','小虎比','alex','wupeiqi','yuanhao']python= ...

  2. start-dfs.sh 和 start-all.sh的区别

    start-dfs.sh 只启动namenode 和datanode, start-all.sh还包括yarn的resourcemanager 和nodemanager 之前就所以因为只启动了star ...

  3. alexNet--deep learning--alexNet的11行代码

    % Copyright 2016 The MathWorks, Inc. clear camera = webcam(  2  ); % Connect to the camerannet = ale ...

  4. OpenCV4Android编译

    http://blog.sina.com.cn/s/blog_602f87700102vdnw.html (2015-04-02 11:10:01) 转载▼     最近的一个项目中,需要自己编译Op ...

  5. 关于erlang解析json数据

    JSON(JavaScript Object Notation)是一种轻量级的数据交换语言,以文字为基础,且易于让人阅读.json的数据格式是文本文档格式的一种.在erlang中可以参考mochiwe ...

  6. jquery 效果网址分享

     http://www.lanrentuku.com/js/ http://www.baidu.com/link?url=2nuImAliKGCKyDeJ7ln2DR_2if5uKgr-em6a3dx ...

  7. listview 下拉刷新

    http://blog.csdn.net/lancees/article/details/7776853

  8. CentOS 安装和配置 Mantis

    Mantis是一个基于PHP技术的轻量级的开源缺陷跟踪系统,以Web操作的形式提供项目管理及缺陷跟踪服务.在功能上.实用性上足以满足中小型项目的管理及跟踪.更重要的是其开源,不需要负担任何费用. 1. ...

  9. xlua学习过程遇到的问题,以后通了之后可能就不是问题了。但是还是有记录的必要。

    //2.加载lua文件,这里这种方式只能够加载Resources文件夹下面的,并且是lua.txt类型的文件,感觉没啥乱用. //文档你说的是Resources文件夹下面的才需要加txt后缀,那么就是 ...

  10. 在苹果iOS平台中获取当前程序进程的进程名等信息

    本文由EasyDarwin开源团队成员Penggy供稿: Objective-C 提供 NSProcessInfo 这个类来获取当前 APP 进程信息, 然而我们的静态库是 pure C++ 工程. ...