Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression

原文：http://blog.csdn.net/abcjennifer/article/details/7732417

本文为Maching Learning 栏目补充内容，为上几章中所提到单参数线性回归、多参数线性回归和逻辑回归的总结版。旨在帮助大家更好地理解回归，所以我在Matlab中分别对他们予以实现，在本文中由易到难地逐个介绍。

本讲内容：

Matlab 实现各种回归函数

=========================

基本模型

Y=θ₀+θ₁X₁型---线性回归（直线拟合）

解决过拟合问题---Regularization

Y=1/(1+e^X)型---逻辑回归（sigmod 函数拟合）

=========================

第一部分：基本模型

在解决拟合问题的解决之前，我们首先回忆一下线性回归和逻辑回归的基本模型。

设待拟合参数 θ_n*1 和输入参数[ x_m*n, y_m*1] 。

对于各类拟合我们都要根据梯度下降的算法，给出两部分：

① cost function（指出真实值y与拟合值h<hypothesis>之间的距离）：给出cost function 的表达式，每次迭代保证cost function的量减小；给出梯度gradient，即cost function对每一个参数θ的求导结果。

function [ jVal,gradient ] = costFunction ( theta )

② Gradient_descent（主函数）：用来运行梯度下降算法，调用上面的cost function进行不断迭代，直到最大迭代次数达到给定标准或者cost function返回值不再减小。

function [optTheta,functionVal,exitFlag]=Gradient_descent( )

线性回归：拟合方程为h_θ(x)=θ₀x₀+θ₁x₁+…+θ_nx_n，当然也可以有x_n的幂次方作为线性回归项（如），这与普通意义上的线性不同，而是类似多项式的概念。

其cost function 为：

逻辑回归：拟合方程为h_θ(x)=1/(1+e^(θ^Tx))，其cost function 为：

cost function对各θj的求导请自行求取，看第三章最后一图，或者参见后文代码。

_{后面，我们分别对几个模型方程进行拟合，给出代码，并用matlab中的fit函数进行验证。}

第二部分：Y=θ₀+θ₁X₁型---线性回归（直线拟合）

在Matlab 线性拟合 & 非线性拟合中我们已经讲过如何用matlab自带函数fit进行直线和曲线的拟合，非常实用。而这里我们是进行ML课程的学习，因此研究如何利用前面讲到的梯度下降法（gradient descent）进行拟合。

cost function：

 function [ jVal,gradient ] = costFunction2( theta )

 %COSTFUNCTION2 Summary of this function goes here

 %   linear regression -> y=theta0 + theta1*x

 %   parameter: x:m*n  theta:n*   y:m*   (m=,n=)

 %   

 %Data

 x=[;;;];

 y=[1.1;2.2;2.7;3.8];

 m=size(x,);

 hypothesis = h_func(x,theta);

 delta = hypothesis - y;

 jVal=sum(delta.^);

 gradient()=sum(delta)/m;

 gradient()=sum(delta.*x)/m;

 end

其中，h_func是hypothesis的结果：

 function [res] = h_func(inputx,theta)

 %H_FUNC Summary of this function goes here

 %   Detailed explanation goes here

 %cost function

 res= theta()+theta()*inputx;function [res] = h_func(inputx,theta)

 end

Gradient_descent：

 function [optTheta,functionVal,exitFlag]=Gradient_descent( )

 %GRADIENT_DESCENT Summary of this function goes here

 %   Detailed explanation goes here

   options = optimset('GradObj','on','MaxIter',);

   initialTheta = zeros(,);

   [optTheta,functionVal,exitFlag] = fminunc(@costFunction2,initialTheta,options);

 end

result：

 >> [optTheta,functionVal,exitFlag] = Gradient_descent()

 Local minimum found.

 Optimization completed because the size of the gradient is less than

 the default value of the function tolerance.

 <stopping criteria details>

 optTheta =

     0.3000

     0.8600

 functionVal =

     0.0720

 exitFlag =

即得y=0.3+0.86x;

验证：

 function [ parameter ] = checkcostfunc(  )

 %CHECKC2 Summary of this function goes here

 %   check if the cost function works well

 %   check with the matlab fit function as standard

 %check cost function

 x=[;;;];

 y=[1.1;2.2;2.7;3.8];

 EXPR= {'x',''};

 p=fittype(EXPR);

 parameter=fit(x,y,p);

 end

运行结果：

 >> checkcostfunc()

 ans = 

      Linear model:

      ans(x) = a*x + b

      Coefficients (with % confidence bounds):

        a =        0.86  (0.4949, 1.225)

        b =         0.3  (-0.6998, 1.3)

和我们的结果一样。下面画图：

 function PlotFunc( xstart,xend )

 %PLOTFUNC Summary of this function goes here

 %   draw original data and the fitted 

 %===================cost function ====linear regression

 %original data

 x1=[;;;];

 y1=[1.1;2.2;2.7;3.8];

 %plot(x1,y1,'ro-','MarkerSize',);

 plot(x1,y1,'rx','MarkerSize',);

 hold on;

 %fitted line - 拟合曲线

 x_co=xstart:0.1:xend;

 y_co=0.3+0.86*x_co;

 %plot(x_co,y_co,'g');

 plot(x_co,y_co);

 hold off;

 end

第三部分：解决过拟合问题---Regularization

过拟合问题解决方法我们已在第三章中讲过，利用Regularization的方法就是在cost function中加入关于θ的项，使得部分θ的值偏小，从而达到fit效果。

例如定义costfunction J(θ)： jVal=(theta(1)-5)^2+(theta(2)-5)^2;

在每次迭代中，按照gradient descent的方法更新参数θ：θ(i)-=gradient(i),其中gradient(i)是J(θ)对θi求导的函数式，在此例中就有gradient(1)=2*(theta(1)-5), gradient(2)=2*(theta(2)-5)。

函数costFunction, 定义jVal=J(θ)和对两个θ的gradient：

 function [ jVal,gradient ] = costFunction( theta )

 %COSTFUNCTION Summary of this function goes here

 %   Detailed explanation goes here

 jVal= (theta()-)^+(theta()-)^;

 gradient = zeros(,);

 %code to compute derivative to theta

 gradient() =  * (theta()-);

 gradient() =  * (theta()-);

 end

Gradient_descent，进行参数优化

 function [optTheta,functionVal,exitFlag]=Gradient_descent( )

 %GRADIENT_DESCENT Summary of this function goes here

 %   Detailed explanation goes here

  options = optimset('GradObj','on','MaxIter',);

  initialTheta = zeros(,)

  [optTheta,functionVal,exitFlag] = fminunc(@costFunction,initialTheta,options);

 end

matlab主窗口中调用，得到优化厚的参数(θ1,θ2)=(5,5)

  [optTheta,functionVal,exitFlag] = Gradient_descent()

 initialTheta =

 Local minimum found.

 Optimization completed because the size of the gradient is less than

 the default value of the function tolerance.

 <stopping criteria details>

 optTheta =

 functionVal =

 exitFlag =

第四部分：Y=1/(1+e^X)型---逻辑回归（sigmod 函数拟合）

hypothesis function:

 function [res] = h_func(inputx,theta)

 %cost function

 tmp=theta()+theta()*inputx;%m*

 res=./(+exp(-tmp));%m*

 end

cost function:

 function [ jVal,gradient ] = costFunction3( theta )

 %COSTFUNCTION3 Summary of this function goes here

 %   Logistic Regression

 x=[-;      -;     -;     ;      ;      ;     ];

 y=[0.01;    0.05;   0.3;    0.45;   0.8;    1.1;    0.99];

 m=size(x,);

 %hypothesis  data

 hypothesis = h_func(x,theta);

 %jVal-cost function  &  gradient updating

 jVal=-sum(log(hypothesis+0.01).*y + (-y).*log(-hypothesis+0.01))/m;

 gradient()=sum(hypothesis-y)/m;   %reflect to theta1

 gradient()=sum((hypothesis-y).*x)/m;    %reflect to theta 

 end

Gradient_descent:

 function [optTheta,functionVal,exitFlag]=Gradient_descent( )

  options = optimset('GradObj','on','MaxIter',);

  initialTheta = [;];

  [optTheta,functionVal,exitFlag] = fminunc(@costFunction3,initialTheta,options);

 end

运行结果：

  [optTheta,functionVal,exitFlag] = Gradient_descent()

 Local minimum found.

 Optimization completed because the size of the gradient is less than

 the default value of the function tolerance.

 <stopping criteria details>

 optTheta =

     0.3526

     1.7573

 functionVal =

     0.2498

 exitFlag =

画图验证：

有朋友问，这里就补充一下logistic regression中gradient的推导：

令

$z = \frac{1}{1+e^{-\theta x}}$

则有

$z'_{\theta}=\frac{e^{-\theta x}}{(1+e^{-\theta x})^2} \cdot (-x) = z(z-1)(-x)\$

由于cost function

$J = ylnz+(1-y)ln(1-z)\$

可得

$J'_{\theta} = y\frac{1}{z}z'_{\theta}+(1-y)\frac{-z'_\theta}{1-z}\\ J'_{\theta}=z'_\theta(\frac{y}{z}-\frac{1-y}{1-z}) = z(z-1)(-x)\frac{y-yz-z+yz}{z(1-z)} = (y-z)x$

所以gradient = -J'(theta) = (z-y)x

Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression的更多相关文章

逻辑回归模型（Logistic Regression）及Python实现
逻辑回归模型(Logistic Regression)及Python实现 http://www.cnblogs.com/sumai 1.模型在分类问题中,比如判断邮件是否为垃圾邮件,判断肿瘤是否为阳 ...
斯坦福机器学习视频笔记 Week3 逻辑回归与正则化 Logistic Regression and Regularization
我们将讨论逻辑回归. 逻辑回归是一种将数据分类为离散结果的方法. 例如,我们可以使用逻辑回归将电子邮件分类为垃圾邮件或非垃圾邮件. 在本模块中,我们介绍分类的概念,逻辑回归的损失函数(cost fun ...
斯坦福CS229机器学习课程笔记 part2：分类和逻辑回归 Classificatiion and logistic regression
Logistic Regression 逻辑回归 1.模型逻辑回归解决的是分类问题,并且是二元分类问题(binary classification),y只有0,1两个取值.对于分类问题使用线性回归不 ...
分类和逻辑回归(Classification and logistic regression)
分类问题和线性回归问题问题很像,只是在分类问题中,我们预测的y值包含在一个小的离散数据集里.首先,认识一下二元分类(binary classification),在二元分类中,y的取值只能是0和1.例 ...
逻辑回归(分类问题)(Logistic Regression、罗杰斯特回归)
逻辑回归:问题只有两项,即{0, 1}.一般而言,回归问题是连续模型,不用在分类问题上,且噪声较大,但如果非要引入,那么采用逻辑回归模型. 对于一般训练集: 参数系统为: 逻辑回归模型为: ...
吴恩达机器学习笔记22-正则化逻辑回归模型(Regularized Logistic Regression)
针对逻辑回归问题,我们在之前的课程已经学习过两种优化算法:我们首先学习了使用梯度下降法来优化代价函数
机器学习算法笔记1_2:分类和逻辑回归(Classification and Logistic regression)
形式: 採用sigmoid函数: g(z)=11+e−z 其导数为g′(z)=(1−g(z))g(z) 如果: 即: 若有m个样本,则似然函数形式是: 对数形式: 採用梯度上升法求其最大值求导: 更 ...
逻辑回归原理面试 Logistic Regression
逻辑回归是假设数据服从独立且服从伯努利分布,多用于二分类场景,应用极大似然估计构造损失函数,并使用梯度下降法对参数进行估计.
吴恩达深度学习：2.9逻辑回归梯度下降法(Logistic Regression Gradient descent)
1.回顾logistic回归,下式中a是逻辑回归的输出,y是样本的真值标签值 . (1)现在写出该样本的偏导数流程图.假设这个样本只有两个特征x1和x2, 为了计算z,我们需要输入参数w1.w2和b还 ...

随机推荐

Android--多选自动搜索提示
一. 效果图常见效果,在搜素提示选中之后可以继续搜索添加,选中的词条用特殊字符分开二. 布局代码 <MultiAutoCompleteTextView android:id="@+ ...
百度地图 api 功能封装类 (ZMap.js) 本地搜索,范围查找实例 [源码下载]
相关说明 1. 界面查看: 吐槽贴:百度地图 api 封装的实用功能 [源码下载] 2. 功能说明: 百度地图整合功能分享修正版[ZMap.js] 实例源码! ZMap.js 本类方法功能大多使用 ...
Jquery-EasyUI学习~
为了回顾,简单记录下EasyUI如何使用: 先来张效果图: 这张图是从后台获取数据,然后进行展示的. 我这里利用的是EF-MVC. 先说下View视图里面的HTML代码是如何写的: @{ ViewBa ...
原型图利器 – Mockplus的审阅功能
Mockplus是一款简洁快速的原型图工具 (http://www.mockplus.cn),最近推出了审阅功能. 审阅,旨在解决团队项目原型设计中的沟通和协作的问题. 没有孤立的原型,更没有一次成型 ...
struts2动态方法
动态方法调用在Struts2中动态方法调用有三种方式,动态方法调用就是为了解决一个Action对应多个请求的处理,以免Action太多第一种方式:指定method属性这种方式我们前面已经用到过, ...
Handlebars的使用方法文档整理(Handlebars.js)
Handlebars是一款很高效的模版引擎,提供语意化的模版语句,最大的兼容Mustache模版引擎, 提供最大的Mustache模版引擎兼容, 无需学习新语法即可使用; Handlebars.js和 ...
PowerDesigner-导出表到word
1. 在工具栏中选择[Report -->Reports],如下图 2. 点击第二个图标创建一个Report,如下图该wizard中有三个信息 Report name Report : Rep ...
PLSQL导入Excel表中数据
PL/SQL 和SQL Sever导入excel数据的原理类似,就是找到一个导入excel数据的功能项,按照步骤走就是了.下面是一个些细节过程,希望对像我这样的菜鸟有帮助. www.2cto.co ...
学习笔记 BIT（树状数组）
痛定思痛,打算切割数据结构,于是乎直接一发BIT 树状数组能做的题目,线段树都可以解决反之则不能,不过树状数组优势在于编码简单和速度更快首先了解下树状数组: 树状数组是一种操作和修改时间复杂度都是 ...
SpringMVC数据库链接池,以及其他相关配置
1.applicationContext.xml <?xml version="1.0" encoding="UTF-8"?> <beans ...

Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression

Matlab实现线性回归和逻辑回归: Linear Regression & Logistic Regression的更多相关文章

随机推荐

热门专题