Introduction

What is machine learning? 

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:

Supervised learning and Unsupervised learning.

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems.

Regression-> a continuous output

Classification-> a discrete output

Example:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We
could turn this example into a classification problem by instead
making our output about whether the house "sells for more or
less than the asking price." Here we are classifying the houses
based on price into two discrete categories.

Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results.

Example:

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment.

Linear Regression with One Variable

Model Representation

X(i)-> “input” variables, also called input features.

Y(i)->“output” or target variable

(X(i),Y(i)) -> a training example

the superscript “(i)” -> an index into the training set, and has nothing to do with exponentiation.

X-> the space of input values

Y-> the space of output values.

m-> the number of samples

function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y.

When the target variable that we’re trying to predict is continuous,we call the learning problem a regression problem. When y can take on only a small number of discrete values, we call it a classification problem.

Cost Function

We can measure the accuracy of our hypothesis function by using a cost function.

This function is otherwise called the "Squared error function", or "Mean squared error(MSE)".

Cost Function - Intuition I

Our objective is to get the best possible line. The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least. Ideally, the line should pass through all the points of our training data set. In such a case, the value of J(θ01) will be 0.

When θ1=1, we get a slope of 1 which goes through every single data point in our model. Conversely, when θ1=0.5, we see the vertical distance from our fit to the data points increase.

This increases our cost function to 0.58. Plotting several other points yields to the following graph:

Thus as a goal, we should try to minimize the cost function. In this case, θ1=1 is our global minimum.

Cost Function - Intuition II

A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the cost function. For example, the three green points found on the green line above have the same value for J(θ01) and as a result, they are found along the same line. The circled x displays the value of the cost function for the graph on the left when θ0 = 800 and θ1= -0.15. Taking another h(x) and plotting its contour plot, one gets the following graphs:

When θ0 = 360 and θ1 = 0, the value of J(θ01) in the contour plot gets closer to the center thus reducing the cost function error. Now giving our hypothesis function a slightly positive slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the result of θ1 and θ0 tend to be around 0.12 and 250 respectively. Plotting those values on our graph to the right seems to put our point in the center of the inner most 'circle'.

Gradient Descent

Now we need to estimate the parameters in the hypothesis function. That's where gradient descent comes in.

The gradient descent algorithm is:

repeat until convergence:

where

j=0,1 represents the feature index number.

At each iteration j, one should simultaneously update the parameters θ12,...,θn.

The size of each step is determined by the parameter α, which is called the learning rate.

Gradient Descent Intuition

The following graph shows that when the slope is negative, the value of θ1 increases and when it is positive, the value of θ1 decreases.

On a side note, we should adjust our parameter α to ensure that the gradient descent algorithm converges in a reasonable time.

How does gradient descent converge with a fixed step size α? approaches 0 as we approach the bottom of our convex function. At the minimum, the derivative will always be 0 and thus we get:

Gradient Descent For Linear Regression

This method looks at every example in the entire training set on every step, and is called batch gradient descent.

where m is the size of the training set, θ0 a constant that will be changing simultaneously with θ1 and xi, yi are values of the given training set (data).

The following is a derivation of for a single example :








Lecture0 -- Introduction&&Linear Regression with One Variable的更多相关文章

  1. Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7700772 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  2. Stanford机器学习---第一讲. Linear Regression with one variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7691571 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  3. 机器学习笔记1——Linear Regression with One Variable

    Linear Regression with One Variable Model Representation Recall that in *regression problems*, we ar ...

  4. Machine Learning 学习笔记2 - linear regression with one variable(单变量线性回归)

    一.Model representation(模型表示) 1.1 训练集 由训练样例(training example)组成的集合就是训练集(training set), 如下图所示, 其中(x,y) ...

  5. Ng第二课:单变量线性回归(Linear Regression with One Variable)

    二.单变量线性回归(Linear Regression with One Variable) 2.1  模型表示 2.2  代价函数 2.3  代价函数的直观理解 2.4  梯度下降 2.5  梯度下 ...

  6. 【cs229-Lecture2】Linear Regression with One Variable (Week 1)(含测试数据和源码)

    从Ⅱ到Ⅳ都在讲的是线性回归,其中第Ⅱ章讲得是简单线性回归(simple linear regression, SLR)(单变量),第Ⅲ章讲的是线代基础,第Ⅳ章讲的是多元回归(大于一个自变量). 本文的 ...

  7. MachineLearning ---- lesson 2 Linear Regression with One Variable

    Linear Regression with One Variable model Representation 以上篇博文中的房价预测为例,从图中依次来看,m表示训练集的大小,此处即房价样本数量:x ...

  8. 斯坦福第二课:单变量线性回归(Linear Regression with One Variable)

    二.单变量线性回归(Linear Regression with One Variable) 2.1  模型表示 2.2  代价函数 2.3  代价函数的直观理解 I 2.4  代价函数的直观理解 I ...

  9. 机器学习 (一) 单变量线性回归 Linear Regression with One Variable

    文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang的个人笔 ...

随机推荐

  1. 学会读懂 MySql 的慢查询日志

    在前边的博客<何时.怎样开启 MySql 日志?>中,我们了解到了怎样启用 MySql 的慢查询日志. 今天我们来看一下怎样去读懂这些慢查询日志.在跟踪慢查询日志之前.首先你得保证最少发生 ...

  2. Spring学习资料

    1.马士兵视频 2.SPRING技术内幕__深入解析SPRING架构与设计原理 3.jinnianshilongnian博客 4.Spring实战 (Spring IN Action) 5.官方文档

  3. MyEclipse 设置智能提示

    choice 1: -->window→Preferences→Java→Editor→Content Assist, --->将Auto activation delay 的数值改为一个 ...

  4. 我的vim插件列表

    一.正在使用的插件 1. NERD tree   文件浏览 2. bufexplorer  buffer 浏览 3. mru.vim   最近使用的文件浏览 4. ctrlp.vim  文件模糊搜索, ...

  5. 监控系统-nagios

    Nagios简介 Nagios是一款开源的电脑系统和网络监视工具,能有效监控Windows.Linux和Unix的主机状态,交换机路由器等网络设备,打印机等.在系统或服务状态异常发出邮件或短信报警第一 ...

  6. Windows下利用CMake和VS2013编译OpenCV

    转载自:http://www.chengxulvtu.com/2014/03/19/windows_build-opencv-with-cmake-and-vs2013.html   获取OpenCV ...

  7. Ubuntu Server 12.04 乱码

    sudo vim /etc/default/locale 将 下面的内容修改 LANG="zh_CN.UTF-8" LANGUAGE="zh_CN:zh" 修改 ...

  8. React - S1

    资料: 1. https://developer.mozilla.org/zh-CN/docs/Web/JavaScript 进度: 教程 - 高级内容remaining; 参考remaining j ...

  9. java中的File文件读写操作

    之前有好几次碰到文件操作方面的问题,大都由于时间太赶而没有好好花时间去细致的研究研究.每次都是在百度或者博客或者论坛里面參照着大牛们写的步骤照搬过来,之后再次碰到又忘记了.刚好今天比較清闲.于是就在网 ...

  10. UniversalImageLoader 学习

    http://www.tuicool.com/articles/zIRNN3z http://www.cnblogs.com/avenwu/archive/2013/05/03/3058468.htm ...