Andrew Ng机器学习二： Logistic Regression

一：逻辑回归（Logistic Regression）

　　背景：假设你是一所大学招生办的领导，你依据学生的成绩，给与他入学的资格。现在有这样一组以前的数据集ex2data1.txt，第一列表示第一次测验的分数，第二列表示第二次测验的分数，第三列1表示允许入学，0表示不允许入学。现在依据这些数据集，设计出一个模型，作为以后的入学标准。

　　我们通过可视化这些数据集，发现其与某条直线方程有关，而结果又只有两类，故我们接下来使用逻辑回归去拟合该数据集。

　　1，回归方程的脚本ex2.m:

%% Machine Learning Online Class - Exercise : Logistic Regression

%

%  Instructions

%  ------------

%

%  This file contains code that helps you get started on the logistic

%  regression exercise. You will need to complete the following functions

%  in this exericse:

%

%     sigmoid.m

%     costFunction.m

%     predict.m

%     costFunctionReg.m

%

%  For this exercise, you will not need to change any code in this file,

%  or any other files other than those mentioned above.

%

%% Initialization

clear ; close all; clc

%% Load Data

%  The first two columns contains the exam scores and the third column

%  contains the label.

data = load('ex2data1.txt');

X = data(:, [, ]); y = data(:, );

%% ==================== Part : Plotting ====================

%  We start the exercise by first plotting the data to understand the

%  the problem we are working with.

fprintf(['Plotting data with + indicating (y = 1) examples and o ' ...

         'indicating (y = 0) examples.\n']);

plotData(X, y);

% Put some labels

hold on;

% Labels and Legend

xlabel('Exam 1 score')

ylabel('Exam 2 score')

% Specified in plot order

legend('Admitted', 'Not admitted')

hold off;

fprintf('\nProgram paused. Press enter to continue.\n');

pause;

%% ============ Part : Compute Cost and Gradient ============

%  In this part of the exercise, you will implement the cost and gradient

%  for logistic regression. You neeed to complete the code in

%  costFunction.m

%  Setup the data matrix appropriately, and add ones for the intercept term

[m, n] = size(X);

% Add intercept term to x and X_test

X = [ones(m, ) X];

% Initialize fitting parameters

initial_theta = zeros(n + , );

% Compute and display initial cost and gradient

[cost, grad] = costFunction(initial_theta, X, y);

fprintf('Cost at initial theta (zeros): %f\n', cost);

fprintf('Expected cost (approx): 0.693\n');

fprintf('Gradient at initial theta (zeros): \n');

fprintf(' %f \n', grad);

fprintf('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n');

% Compute and display cost and gradient with non-zero theta

test_theta = [-; 0.2; 0.2];

[cost, grad] = costFunction(test_theta, X, y);

fprintf('\nCost at test theta: %f\n', cost);

fprintf('Expected cost (approx): 0.218\n');

fprintf('Gradient at test theta: \n');

fprintf(' %f \n', grad);

fprintf('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n');

fprintf('\nProgram paused. Press enter to continue.\n');

pause;

%% ============= Part : Optimizing using fminunc  =============

%  In this exercise, you will use a built-in function (fminunc) to find the

%  optimal parameters theta.

%  Set options for fminunc

options = optimset('GradObj', 'on', 'MaxIter', );

%  Run fminunc to obtain the optimal theta

%  This function will return theta and the cost

[theta, cost] = ...

    fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);

% Print theta to screen

fprintf('Cost at theta found by fminunc: %f\n', cost);

fprintf('Expected cost (approx): 0.203\n');

fprintf('theta: \n');

fprintf(' %f \n', theta);

fprintf('Expected theta (approx):\n');

fprintf(' -25.161\n 0.206\n 0.201\n');

% Plot Boundary

plotDecisionBoundary(theta, X, y);

% Put some labels

hold on;

% Labels and Legend

xlabel('Exam 1 score')

ylabel('Exam 2 score')

% Specified in plot order

legend('Admitted', 'Not admitted')

hold off;

fprintf('\nProgram paused. Press enter to continue.\n');

pause;

%% ============== Part : Predict and Accuracies ==============

%  After learning the parameters, you'll like to use it to predict the outcomes

%  on unseen data. In this part, you will use the logistic regression model

%  to predict the probability that a student with score  on exam  and

%  score  on exam  will be admitted.

%

%  Furthermore, you will compute the training and test set accuracies of

%  our model.

%

%  Your task is to complete the code in predict.m

%  Predict probability for a student with score  on exam

%  and score  on exam  

prob = sigmoid([  ] * theta);

fprintf(['For a student with scores 45 and 85, we predict an admission ' ...

         'probability of %f\n'], prob);

fprintf('Expected value: 0.775 +/- 0.002\n\n');

% Compute accuracy on our training set

p = predict(theta, X);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * );

fprintf('Expected accuracy (approx): 89.0\n');

fprintf('\n');

ex2.m

　　2,可视化数据plotData.m：

function plotData(X, y)

%PLOTDATA Plots the data points X and y into a new figure

%   PLOTDATA(x,y) plots the data points with + for the positive examples

%   and o for the negative examples. X is assumed to be a Mx2 matrix.

% Create New Figure

figure;   hold on;

% ====================== YOUR CODE HERE ======================

% Instructions: Plot the positive and negative examples on a

%               2D plot, using the option 'k+' for the positive

%               examples and 'ko' for the negative examples.

%

   pos=find(y==);

  neg=find(y==);

  plot(X(pos,),X(pos,),'k+','LineWidth',,'MarkerSize',);

  plot(X(neg,),X(neg,),'ko','MarkerFaceColor','y','MarkerSize',);

% =========================================================================

hold off;

end

plotData.m

　　3，逻辑回归的逻辑函数（Sigmoid Function/Logistic Function）：

　　$h_{\theta}(x)=g(\theta^{T}x)$ ：表示在输入为$x$，预测为$y=1$的概率

　　$g(z)=\frac{1}{1+e^{-z}}$　　

function g = sigmoid(z)

%SIGMOID Compute sigmoid function

%   g = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly

g = zeros(size(z));

% ====================== YOUR CODE HERE ======================

% Instructions: Compute the sigmoid of each value of z (z can be a matrix,

%               vector or scalar).

  g=./(+exp(-z));

% =============================================================

end

sigmoid.m

　　4，逻辑回归的代价函数：

　　$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))]$

function [J, grad] = costFunction(theta, X, y)

%COSTFUNCTION Compute cost and gradient for logistic regression

%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the

%   parameter for logistic regression and the gradient of the cost

%   w.r.t. to the parameters.

% Initialize some useful values

m = length(y); % number of training examples

% You need to return the following variables correctly

J = ;

grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================

% Instructions: Compute the cost of a particular choice of theta.

%               You should set J to the cost.

%               Compute the partial derivatives and set grad to the partial

%               derivatives of the cost w.r.t. each parameter in theta

%

% Note: grad should have the same dimensions as theta

%

  h=sigmoid(X*theta); %求hθ(x)

  J=-sum(y.*log(h)+(-y).*log(-h))/m; %代价函数

  grad=(X')*(h-y)./m; %梯度下降,没有学习速率α，之后给我们调用内置函数fminunc使用

##  h=sigmoid(X*theta);

##J=sum(-y'*log(h)-(1-y)'*log(-h))/m;

##grad=((h-y)'*X)/m;

% =============================================================

end

costFunction.m

　　5，带学习速率$\alpha$的梯度下降：

　　$\theta_j:=\theta_j-\frac{\alpha}{m }\sum_{i=1}^{m}[(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j]$

　　不带学习速率$\alpha$的梯度下降(给之后fminunc作为梯度下降使用)：

　　$\frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^{m}[(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j]$

　　使用内置fminunc函数来拟合参数$\theta$，之前我们是使用梯度下降来拟合参数$\theta$的，在这同样也能使用，不过我们这里使用内置fminunc函数来去拟合，它会自动选择学习速率$\alpha$，不需要我们手工选择，我们只需要给定一个迭代次数，一个写好的代价函数，初始化$\theta$，最后它会为我们找到最优的$\theta$，它像可以加强版的梯度下降法。

options = optimset('GradObj', 'on', 'MaxIter', );

[theta, cost] = ...

    fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);//自己写好的costFunction函数

　　6，根据拟合好的参数$\theta$，预测数据，例如我们想预测某学生第一次分数为45，第二次分数为85，该学生能入学的概率为：

prob = sigmoid([  ] * theta); %入学的概率

　　预测样本X，我们可以看到预测的准确率为89%。

function p = predict(theta, X)

%PREDICT Predict whether the label is  or  using learned logistic

%regression parameters theta

%   p = PREDICT(theta, X) computes the predictions for X using a

%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

m = size(X, ); % Number of training examples

% You need to return the following variables correctly

p = zeros(m, );

% ====================== YOUR CODE HERE ======================

% Instructions: Complete the following code to make predictions using

%               your learned logistic regression parameters.

%               You should set p to a vector of 's and 1's

%

  %第一种

  for i=:m

    p(i,)=sigmoid(X(i,:)*theta)>=0.5; %预测每一个样本的结果，大于0.5为正向类

  end;

  %第二种

  %

##  ans=sigmoid(X*theta);

##  for i=:m

##      if(ans(i,)>=0.5)

##        p(i,)=;

##      else

##        p(i,)=;

##  end

% =========================================================================

end

predict.m

二：正则化逻辑回归（Regularized logistic regression）：

　　背景：假如你是某所工厂的管理员，该工厂生产芯片，每片芯片要经过两次测试后，达到标准方可通过，现在有一组以前的数据集ex2data2.txt，第一列为第一次测试的结果，第二列为第二次测试的结果，第三列1表示该芯片合格，0表示不合格。现在要你通过这些数据，拟合出一个模型，这个模型将作为以后判断芯片是否合格的标准。

　　我们通过可视化这些数据集，发现其与某条复杂的曲线方程有关，而数据集只有两个特征$x_1$和$x_2$，显然是拟合不出曲线，那么我们可以通过原本的两个特征创造出更多的特征，将原本的特征映射为6次幂，这样我们就得到了28维的特征向量。当特征多了的话，很可能会出现过拟合，显然这不是我们想要的（即是它能很好的拟合原训练集，但预测新样本的能力会很低）。

构造更多的特征：

function out = mapFeature(X1, X2)

% MAPFEATURE Feature mapping function to polynomial features

%

%   MAPFEATURE(X1, X2) maps the two input features

%   to quadratic features used in the regularization exercise.

%

%   Returns a new feature array with more features, comprising of

%   X1, X2, X1.^, X2.^, X1*X2, X1*X2.^, etc..

%

%   Inputs X1, X2 must be the same size

%

degree = ;

out = ones(size(X1(:,)));

for i = :degree

    for j = :i

        out(:, end+) = (X1.^(i-j)).*(X2.^j);

    end

end

end

mapFeature.m

所以这时我们使用正则化(Regularization)来解决过拟合的问题。

　　1，正则化回归的脚本ex2.m:　

%% Machine Learning Online Class - Exercise : Logistic Regression

%

%  Instructions

%  ------------

%

%  This file contains code that helps you get started on the second part

%  of the exercise which covers regularization with logistic regression.

%

%  You will need to complete the following functions in this exericse:

%

%     sigmoid.m

%     costFunction.m

%     predict.m

%     costFunctionReg.m

%

%  For this exercise, you will not need to change any code in this file,

%  or any other files other than those mentioned above.

%

%% Initialization

clear ; close all; clc

%% Load Data

%  The first two columns contains the X values and the third column

%  contains the label (y).

data = load('ex2data2.txt');

X = data(:, [, ]); y = data(:, );

plotData(X, y);

% Put some labels

hold on;

% Labels and Legend

xlabel('Microchip Test 1')

ylabel('Microchip Test 2')

% Specified in plot order

legend('y = 1', 'y = 0')

hold off;

%% =========== Part : Regularized Logistic Regression ============

%  In this part, you are given a dataset with data points that are not

%  linearly separable. However, you would still like to use logistic

%  regression to classify the data points.

%

%  To do so, you introduce more features to use -- in particular, you add

%  polynomial features to our data matrix (similar to polynomial

%  regression).

%

% Add Polynomial Features

% Note that mapFeature also adds a column of ones for us, so the intercept

% term is handled

X = mapFeature(X(:,), X(:,)); %c从原来的二维变成了28(+1截距项)维，m*

% Initialize fitting parameters

initial_theta = zeros(size(X, ), );

% Set regularization parameter lambda to

lambda = ;

% Compute and display initial cost and gradient for regularized logistic

% regression

[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);

fprintf('Cost at initial theta (zeros): %f\n', cost);

fprintf('Expected cost (approx): 0.693\n');

fprintf('Gradient at initial theta (zeros) - first five values only:\n');

fprintf(' %f \n', grad(:));

fprintf('Expected gradients (approx) - first five values only:\n');

fprintf(' 0.0085\n 0.0188\n 0.0001\n 0.0503\n 0.0115\n');

fprintf('\nProgram paused. Press enter to continue.\n');

pause;

% Compute and display cost and gradient

% with all-ones theta and lambda =

test_theta = ones(size(X,),);

[cost, grad] = costFunctionReg(test_theta, X, y, );

fprintf('\nCost at test theta (with lambda = 10): %f\n', cost);

fprintf('Expected cost (approx): 3.16\n');

fprintf('Gradient at test theta - first five values only:\n');

fprintf(' %f \n', grad(:));

fprintf('Expected gradients (approx) - first five values only:\n');

fprintf(' 0.3460\n 0.1614\n 0.1948\n 0.2269\n 0.0922\n');

fprintf('\nProgram paused. Press enter to continue.\n');

pause;

%% ============= Part : Regularization and Accuracies =============

%  Optional Exercise:

%  In this part, you will get to try different values of lambda and

%  see how regularization affects the decision coundart

%

%  Try the following values of lambda (, , , ).

%

%  How does the decision boundary change when you vary lambda? How does

%  the training set accuracy vary?

%

% Initialize fitting parameters

initial_theta = zeros(size(X, ), );

% Set regularization parameter lambda to  (you should vary this)

lambda = ;

% Set Options

options = optimset('GradObj', 'on', 'MaxIter', );

% Optimize

[theta, J, exit_flag] = ...

    fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);

% Plot Boundary

plotDecisionBoundary(theta, X, y);

hold on;

title(sprintf('lambda = %g', lambda))

% Labels and Legend

xlabel('Microchip Test 1')

ylabel('Microchip Test 2')

legend('y = 1', 'y = 0', 'Decision boundary')

hold off;

% Compute accuracy on our training set

p = predict(theta, X);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * );

fprintf('Expected accuracy (with lambda = 1): 83.1 (approx)\n');

ex2_reg.m

　　2，正则化逻辑回归代价函数(忽略偏差项$\theta_0$的正则化)：

　　$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))]+\frac{\lambda }{2m}\sum_{j=1}^{n}\theta_j^{2}$

　　3，梯度下降：

　　带学习速率：

　　　　$\theta_0:=\theta_0-\alpha \frac{1}{m }\sum_{i=1}^{m}[(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_0]$ for $j=0$

　　　　$\theta_j:=\theta_j-\alpha (\frac{1}{m }\sum_{i=1}^{m}[(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j]+\frac{\lambda }{m}\theta_j)$ for $j\geq 1$

　　不带学习速率(给之后fminunc作为梯度下降使用)：

　　　　$\frac{\partial J(\theta)}{\partial \theta_0}=\frac{1}{m}\sum_{i=1}^{m}[(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_0]$ for $j=0$

　　　　$\frac{\partial J(\theta)}{\partial \theta_j}=(\frac{1}{m}\sum_{i=1}^{m}[(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j])+\frac{\lambda }{m}\theta_j $ for $j\geq 1$

function [J, grad] = costFunctionReg(theta, X, y, lambda)

%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization

%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using

%   theta as the parameter for regularized logistic regression and the

%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values

m = length(y); % number of training examples

% You need to return the following variables correctly

J = ;

grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================

% Instructions: Compute the cost of a particular choice of theta.

%               You should set J to the cost.

%               Compute the partial derivatives and set grad to the partial

%               derivatives of the cost w.r.t. each parameter in theta

  h=sigmoid(X*theta);

  n=size(X,);

  J=(-(y')*log(h)-(1-y)'*log(-h))/m+(lambda/(*m))*sum(theta([:n],:).^); %忽略偏差项θ()的影响

  grad(,)=((X(:,)')*(h-y))/m; %梯度下降

  grad([:n],:)=(X(:,[:n])')*(h-y)./m+(theta([2:n],:)).*(lambda/m);

##h=sigmoid(X*theta);

##theta(,)=;

##J=sum(-y'*log(h)-(1-y)'*log(-h))/m+lambda//m*sum(power(theta,));

##grad=((h-y)'*X)/m+lambda/m*theta';

% =============================================================

end

costFunctionReg.m

　　我们可以选择不同的$\lambda$大小去拟合数据集并可视化，选择一个较优的$lambda$。

　　4，预测方法跟逻辑回归差不多，只是现在加入要预测第一次分数为45，第二次分数为80时，要先将这两个特征放到mapFeature函数构造。

我的标签：做个有情怀的程序员。

Andrew Ng机器学习二： Logistic Regression的更多相关文章

（原创）Stanford Machine Learning (by Andrew NG) --- (week 3) Logistic Regression & Regularization
coursera上面Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml 我曾经使用Logistic Regressio ...
Andrew Ng机器学习课程笔记（二）之逻辑回归
Andrew Ng机器学习课程笔记(二)之逻辑回归版权声明:本文为博主原创文章,转载请指明转载地址 http://www.cnblogs.com/fydeblog/p/7364636.html 前言 ...
Andrew Ng机器学习课程6
Andrew Ng机器学习课程6 说明在前面尾随者台大机器学习基石课程和机器学习技法课程的设置,对机器学习所涉及到的大部分的知识有了一个较为全面的了解,可是对于没有动手敲代码并加以使用的情况,基本上 ...
Andrew Ng机器学习课程10
Andrew Ng机器学习课程10 a example 如果hypothesis set中的hypothesis是由d个real number决定的,那么用64位的计算机数据表示的话,那么模型的个数一 ...
Andrew Ng机器学习课程笔记（四）之神经网络
Andrew Ng机器学习课程笔记(四)之神经网络版权声明:本文为博主原创文章,转载请指明转载地址 http://www.cnblogs.com/fydeblog/p/7365730.html 前言 ...
Andrew Ng机器学习课程笔记--week1（机器学习介绍及线性回归）
title: Andrew Ng机器学习课程笔记--week1(机器学习介绍及线性回归) tags: 机器学习, 学习笔记 grammar_cjkRuby: true --- 之前看过一遍,但是总是模 ...
Andrew Ng机器学习课程笔记（五）之应用机器学习的建议
Andrew Ng机器学习课程笔记(五)之应用机器学习的建议版权声明:本文为博主原创文章,转载请指明转载地址 http://www.cnblogs.com/fydeblog/p/7368472.h ...
Andrew Ng机器学习课程笔记--汇总
笔记总结,各章节主要内容已总结在标题之中 Andrew Ng机器学习课程笔记–week1(机器学习简介&线性回归模型) Andrew Ng机器学习课程笔记--week2(多元线性回归& ...
Andrew Ng机器学习课程笔记（六）之机器学习系统的设计
Andrew Ng机器学习课程笔记(六)之机器学习系统的设计版权声明:本文为博主原创文章,转载请指明转载地址 http://www.cnblogs.com/fydeblog/p/7392408.h ...

随机推荐

[LeetCode] 247. Strobogrammatic Number II 对称数II
A strobogrammatic number is a number that looks the same when rotated 180 degrees (looked at upside ...
[LeetCode] 628. Maximum Product of Three Numbers 三个数字的最大乘积
Given an integer array, find three numbers whose product is maximum and output the maximum product. ...
openstack 权限控制（添加自定义角色）keystone等组件
每一个平台.系统都会对于用户的权限进行严格的管理与控制. openstack是一个开源的项目,我们可以直接下载其源码,进行更改以达到我们的要求. 这里只是针对于用户的权限进行管理,以keystone: ...
jenkins回滚之groovy动态获取版本号
grovvy调试: 部署路径确定下来, 每个服务写死,传参服务名 + 环境给版本服务返回版本信息: groovy取分支: def gettags = ("git ls-remote -h ...
Django文档阅读之模型
模型模型是您的数据唯一而且准确的信息来源.它包含您正在储存的数据的重要字段和行为.一般来说,每一个模型都映射一个数据库表. 基础: 每个模型都是一个 Python 的类,这些类继承 django.d ...
ztree实现拖拽移动和复制
1.官网下载ztree:http://www.treejs.cn/v3/api.php 2.引入jquery.ztree.all.min.js 注意,这是基于jQuery的插件,请引入相关js 3.设 ...
《十天学会 PHP》的重难点
记录一下我在学习<十天学会 PHP>(第六版)的过程中的遇到的重难点,该课程是学习制作一个简单的留言板. 准备工作 XAMPP(Apache + MySQL + PHP + PERL) 是 ...
使用Xshell采用证书登录Linux
1,工具--- 用户key生成向导,选择秘钥类型,注意DSA只能选择1024位级以下,超过1024位的服务器不认. 2 给公钥取一个名字,并设置密码,保存为文件 3 在linux 用户目录,用cd ...
Python开发之virtualenv和virtualenvwrapper详解
在使用 Python 开发的过程中,工程一多,难免会碰到不同的工程依赖不同版本的库的问题: 亦或者是在开发过程中不想让物理环境里充斥各种各样的库,引发未来的依赖灾难. 此时,我们需要对于不同的工程使用 ...
【LEETCODE】39、第561题 Array Partition I
package y2019.Algorithm.array; /** * @ProjectName: cutter-point * @Package: y2019.Algorithm.array * ...

Andrew Ng机器学习 二： Logistic Regression

Andrew Ng机器学习 二： Logistic Regression的更多相关文章

随机推荐

热门专题

Andrew Ng机器学习二： Logistic Regression

Andrew Ng机器学习二： Logistic Regression的更多相关文章