Introduction

What is machine learning? 

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:

Supervised learning and Unsupervised learning.

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems.

Regression-> a continuous output

Classification-> a discrete output

Example:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We
could turn this example into a classification problem by instead
making our output about whether the house "sells for more or
less than the asking price." Here we are classifying the houses
based on price into two discrete categories.

Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results.

Example:

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment.

Linear Regression with One Variable

Model Representation

X(i)-> “input” variables, also called input features.

Y(i)->“output” or target variable

(X(i),Y(i)) -> a training example

the superscript “(i)” -> an index into the training set, and has nothing to do with exponentiation.

X-> the space of input values

Y-> the space of output values.

m-> the number of samples

function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y.

When the target variable that we’re trying to predict is continuous,we call the learning problem a regression problem. When y can take on only a small number of discrete values, we call it a classification problem.

Cost Function

We can measure the accuracy of our hypothesis function by using a cost function.

This function is otherwise called the "Squared error function", or "Mean squared error(MSE)".

Cost Function - Intuition I

Our objective is to get the best possible line. The best possible line will be such so that the average squared vertical distances of the scattered points from the line will be the least. Ideally, the line should pass through all the points of our training data set. In such a case, the value of J(θ01) will be 0.

When θ1=1, we get a slope of 1 which goes through every single data point in our model. Conversely, when θ1=0.5, we see the vertical distance from our fit to the data points increase.

This increases our cost function to 0.58. Plotting several other points yields to the following graph:

Thus as a goal, we should try to minimize the cost function. In this case, θ1=1 is our global minimum.

Cost Function - Intuition II

A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line. An example of such a graph is the one to the right below.

Taking any color and going along the 'circle', one would expect to get the same value of the cost function. For example, the three green points found on the green line above have the same value for J(θ01) and as a result, they are found along the same line. The circled x displays the value of the cost function for the graph on the left when θ0 = 800 and θ1= -0.15. Taking another h(x) and plotting its contour plot, one gets the following graphs:

When θ0 = 360 and θ1 = 0, the value of J(θ01) in the contour plot gets closer to the center thus reducing the cost function error. Now giving our hypothesis function a slightly positive slope results in a better fit of the data.

The graph above minimizes the cost function as much as possible and consequently, the result of θ1 and θ0 tend to be around 0.12 and 250 respectively. Plotting those values on our graph to the right seems to put our point in the center of the inner most 'circle'.

Gradient Descent

Now we need to estimate the parameters in the hypothesis function. That's where gradient descent comes in.

The gradient descent algorithm is:

repeat until convergence:

where

j=0,1 represents the feature index number.

At each iteration j, one should simultaneously update the parameters θ12,...,θn.

The size of each step is determined by the parameter α, which is called the learning rate.

Gradient Descent Intuition

The following graph shows that when the slope is negative, the value of θ1 increases and when it is positive, the value of θ1 decreases.

On a side note, we should adjust our parameter α to ensure that the gradient descent algorithm converges in a reasonable time.

How does gradient descent converge with a fixed step size α? approaches 0 as we approach the bottom of our convex function. At the minimum, the derivative will always be 0 and thus we get:

Gradient Descent For Linear Regression

This method looks at every example in the entire training set on every step, and is called batch gradient descent.

where m is the size of the training set, θ0 a constant that will be changing simultaneously with θ1 and xi, yi are values of the given training set (data).

The following is a derivation of for a single example :








Lecture0 -- Introduction&&Linear Regression with One Variable的更多相关文章

  1. Stanford机器学习---第二讲. 多变量线性回归 Linear Regression with multiple variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7700772 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  2. Stanford机器学习---第一讲. Linear Regression with one variable

    原文:http://blog.csdn.net/abcjennifer/article/details/7691571 本栏目(Machine learning)包括单参数的线性回归.多参数的线性回归 ...

  3. 机器学习笔记1——Linear Regression with One Variable

    Linear Regression with One Variable Model Representation Recall that in *regression problems*, we ar ...

  4. Machine Learning 学习笔记2 - linear regression with one variable(单变量线性回归)

    一.Model representation(模型表示) 1.1 训练集 由训练样例(training example)组成的集合就是训练集(training set), 如下图所示, 其中(x,y) ...

  5. Ng第二课:单变量线性回归(Linear Regression with One Variable)

    二.单变量线性回归(Linear Regression with One Variable) 2.1  模型表示 2.2  代价函数 2.3  代价函数的直观理解 2.4  梯度下降 2.5  梯度下 ...

  6. 【cs229-Lecture2】Linear Regression with One Variable (Week 1)(含测试数据和源码)

    从Ⅱ到Ⅳ都在讲的是线性回归,其中第Ⅱ章讲得是简单线性回归(simple linear regression, SLR)(单变量),第Ⅲ章讲的是线代基础,第Ⅳ章讲的是多元回归(大于一个自变量). 本文的 ...

  7. MachineLearning ---- lesson 2 Linear Regression with One Variable

    Linear Regression with One Variable model Representation 以上篇博文中的房价预测为例,从图中依次来看,m表示训练集的大小,此处即房价样本数量:x ...

  8. 斯坦福第二课:单变量线性回归(Linear Regression with One Variable)

    二.单变量线性回归(Linear Regression with One Variable) 2.1  模型表示 2.2  代价函数 2.3  代价函数的直观理解 I 2.4  代价函数的直观理解 I ...

  9. 机器学习 (一) 单变量线性回归 Linear Regression with One Variable

    文章内容均来自斯坦福大学的Andrew Ng教授讲解的Machine Learning课程,本文是针对该课程的个人学习笔记,如有疏漏,请以原课程所讲述内容为准.感谢博主Rachel Zhang的个人笔 ...

随机推荐

  1. Block系列1:初识block

    //-------1.定义函数----- //1.函数 int sum(int a,int b) { return a+b; } //------------------2.声明--------- / ...

  2. mysql字符太长警告

    用navicateclient,打开相应的数据库. 打开函数.找相应的val()函数,进行编辑,就能够!编辑范围为4000

  3. ubuntu 下的中文输入法的安装和配置- ibus

    ibus输入法 Chinese语言包安装 首先需要给Ubuntu16.04安装Chinese语言包支持.  如上图点击其中的Install/Remove Languages…,这个对话框是通过syst ...

  4. anaconda的所有版本大全--下载地址

    地址: https://repo.continuum.io/archive/ 内容: Anaconda installer archive Filename Size Last Modified MD ...

  5. fedora delete openJDK

    博客分类: linux   由于Fedora系统安装的时候会自带OpenJDK,安装完系统后 java -version 会显示  [root@localhost bin]# java -versio ...

  6. js event 的target 和currentTarget

    target  点击的实际tag currentTarget 绑定事件的target

  7. registerForRemoteNotificationTypes: is not supported in iOS 8.0 and later

    本文转载至 http://bbs.csdn.net/topics/390889517 IOS8 PUSH解决方法 昨天晚上整理PUSH的东西,准备些一个教程,全部弄好之后,发现没有达到预期的效果,本以 ...

  8. Spring MVC之简单入门

    一.Spring MVC简介: 1.什么是MVC 模型-视图-控制器(MVC)是一个众所周知的以设计界面应用程序为基础的设计模式.它主要通过分离模型(Model).视图(View)及控制器(Contr ...

  9. 那些让你代码思维和能力有较大的提升Java源码(转)

    对于学习J2EE的框架有很大的帮助,代码里使用了各种设计模式.事件机制.Java8语法.代码量也很小,web服务使用Netty作为支持,对HTTP/网络想研究的一定是你的必读品.目前在写 Blade- ...

  10. opencv配置指南

    今天配置了一把opencv,在vs2013,Python.IDEA(Java)上分别作了配置.总结成文档,分享给大家. 搭建opencv+vs2013的环境 安装opencv3.0 alpha 和 v ...