Introduction

What is machine learning? 

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:

Supervised learning and Unsupervised learning.

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems.

Regression-> a continuous output

Classification-> a discrete output

Example:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We
could turn this example into a classification problem by instead
making our output about whether the house "sells for more or
less than the asking price." Here we are classifying the houses
based on price into two discrete categories.

Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results.

Example:

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment.

Linear Regression with One Variable

Model Representation

X(i)-> “input” variables, also called input features.

Y(i)->“output” or target variable

(X(i),Y(i)) -> a training example

the superscript “(i)” -> an index into the training set, and has nothing to do with exponentiation.

X-> the space of input values

Y-> the space of output values.

m-> the number of samples

function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y.

When the target variable that we’re trying to predict is continuous,we call the learning problem a regression problem. When y can take on only a small number of discrete values, we call it a classification problem.

Cost Function

We can measure the accuracy of our hypothesis function by using a cost function.

This function is otherwise called the "Squared error function", or "Mean squared error(MSE)".

Cost Function - Intuition I

Our objective is to get the best possible line. The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least. Ideally, the line should pass through all the points of our training data set. In such a case, the value of J(θ01) will be 0.

When θ1=1, we get a slope of 1 which goes through every single data point in our model. Conversely, when θ1=0.5, we see the vertical distance from our fit to the data points increase.

This increases our cost function to 0.58. Plotting several other points yields to the following graph:

Thus as a goal, we should try to minimize the cost function. In this case, θ1=1 is our global minimum.

Cost Function - Intuition II

A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the cost function. For example, the three green points found on the green line above have the same value for J(θ01) and as a result, they are found along the same line. The circled x displays the value of the cost function for the graph on the left when θ0 = 800 and θ1= -0.15. Taking another h(x) and plotting its contour plot, one gets the following graphs:

When θ0 = 360 and θ1 = 0, the value of J(θ01) in the contour plot gets closer to the center thus reducing the cost function error. Now giving our hypothesis function a slightly positive slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the result of θ1 and θ0 tend to be around 0.12 and 250 respectively. Plotting those values on our graph to the right seems to put our point in the center of the inner most 'circle'.

Gradient Descent

Now we need to estimate the parameters in the hypothesis function. That's where gradient descent comes in.

The gradient descent algorithm is:

repeat until convergence:

where

j=0,1 represents the feature index number.

At each iteration j, one should simultaneously update the parameters θ12,...,θn.

The size of each step is determined by the parameter α, which is called the learning rate.

Gradient Descent Intuition

The following graph shows that when the slope is negative, the value of θ1 increases and when it is positive, the value of θ1 decreases.

On a side note, we should adjust our parameter α to ensure that the gradient descent algorithm converges in a reasonable time.

How does gradient descent converge with a fixed step size α? approaches 0 as we approach the bottom of our convex function. At the minimum, the derivative will always be 0 and thus we get:

Gradient Descent For Linear Regression

This method looks at every example in the entire training set on every step, and is called batch gradient descent.

where m is the size of the training set, θ0 a constant that will be changing simultaneously with θ1 and xi, yi are values of the given training set (data).

The following is a derivation of for a single example :








Lecture0 -- Introduction&&Linear Regression with One Variable的更多相关文章

  1. Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7700772 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  2. Stanford机器学习---第一讲. Linear Regression with one variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7691571 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  3. 机器学习笔记1——Linear Regression with One Variable

    Linear Regression with One Variable Model Representation Recall that in *regression problems*, we ar ...

  4. Machine Learning 学习笔记2 - linear regression with one variable(单变量线性回归)

    一.Model representation(模型表示) 1.1 训练集 由训练样例(training example)组成的集合就是训练集(training set), 如下图所示, 其中(x,y) ...

  5. Ng第二课:单变量线性回归(Linear Regression with One Variable)

    二.单变量线性回归(Linear Regression with One Variable) 2.1  模型表示 2.2  代价函数 2.3  代价函数的直观理解 2.4  梯度下降 2.5  梯度下 ...

  6. 【cs229-Lecture2】Linear Regression with One Variable (Week 1)(含测试数据和源码)

    从Ⅱ到Ⅳ都在讲的是线性回归,其中第Ⅱ章讲得是简单线性回归(simple linear regression, SLR)(单变量),第Ⅲ章讲的是线代基础,第Ⅳ章讲的是多元回归(大于一个自变量). 本文的 ...

  7. MachineLearning ---- lesson 2 Linear Regression with One Variable

    Linear Regression with One Variable model Representation 以上篇博文中的房价预测为例,从图中依次来看,m表示训练集的大小,此处即房价样本数量:x ...

  8. 斯坦福第二课:单变量线性回归(Linear Regression with One Variable)

    二.单变量线性回归(Linear Regression with One Variable) 2.1  模型表示 2.2  代价函数 2.3  代价函数的直观理解 I 2.4  代价函数的直观理解 I ...

  9. 机器学习 (一) 单变量线性回归 Linear Regression with One Variable

    文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang的个人笔 ...

随机推荐

  1. 8款精美的HTML5图片动画分享

    From:http://geek.csdn.net/news/detail/196250 HTML5结合jQuery可以让网页图片变得更加绚丽多彩,比如实现一些图片3D切换.CSS3动画绘制以及各种图 ...

  2. ffmpeg h264编码 extradata 为空

    ffmpeg h264编码的例子前面的文章已经介绍,本来主要讲述影响AVCodecContext extradata是否为 空的配置项.如果要求open编码器以后AVCodecContext extr ...

  3. SQL获取年月日方法

    方法一:利用DATENAME 在SQL数据库中,DATENAME(datetype,date)函数的作用是从日期中提取指定部分数据,其返回类型是nvarchar.datetype类型见附表1. SEL ...

  4. HashSet、LinkHashSet、TreeSet总结

    HashSet:散列集,集合中的元素不允许重复,但是不要求顺序,输出的顺序和进入HashSet的顺序是没有关系的 LinkedHashSet :链表散列集,集合中的元素不允许重复,同时要求和进入Set ...

  5. 简述C++中的多态机制

    前言 封装性,继承性,多态性是面向对象语言的三大特性.其中封装,继承好理解,而多态的概念让许多初学者感到困惑.本文将讲述C++中多态的概念以及多态的实现机制. 什么是多态? 多态就是多种形态,就是许多 ...

  6. HDU 5336 XYZ and Drops 2015 Multi-University Training Contest 4 1010

    这题的题意是给你一幅图,图里面有水滴.每一个水滴都有质量,然后再给你一个起点,他会在一開始的时候向四周发射4个小水滴,假设小水滴撞上水滴,那么他们会融合,假设质量大于4了,那么就会爆炸,向四周射出质量 ...

  7. LeetCode: Binary Tree Postorder Traversal [145]

    [题目] Given a binary tree, return the postorder traversal of its nodes' values. For example: Given bi ...

  8. 让EasyDarwin只支持RTP over TCP传输

    我们经常需要EasyDarwin服务器支持公网流媒体传输,但很多时候,播放器默认都是通过RTP over UDP的形式在RTSP SETUP中请求,往往都以在内网接收不到UDP数据失败结束,那么我们如 ...

  9. mapper代理(十一)

    原始 dao开发问题 1.dao接口实现类方法中存在大量模板方法,设想能否将这些代码提取出来,大大减轻程序员的工作量. 2.调用sqlsession方法时将statement的id硬编码了 3.调用s ...

  10. docker: docker安装和镜像下载

    1 安装docker的apt源 apt-get install apt-transport-https ca-certificates curl software-properties-common ...