Classification and logistic regression
logistic 回归
1.问题:
在上面讨论回归问题时。讨论的结果都是连续类型。但假设要求做分类呢?即讨论结果为离散型的值。
2.解答:
假设:
当中:
g(z)的图形例如以下:
由此可知:当hθ(x)<0.5时我们能够觉得为0,反之为1,这样就变成离散型的数据了。推导迭代式:
- 利用概率论进行推导,找出样本服从的分布类型,利用最大似然法求出对应的θ
- 因此:
结果:
注意:这里的迭代式增量迭代法
Newton迭代法:
1.问题:
上述迭代法,收敛速度非常慢,在利用最大似然法求解的时候能够运用Newton迭代法,即θ := θ−f(θ)f′(θ)
2.解答:
推导:
- Newton迭代法是求θ,且f(θ)=0。刚好:l′(θ)=0
- 所以能够将Newton迭代法改写成:
定义:
- 当中:l′(θ) =
因此:H矩阵就是l′′(θ),即H−1 = 1/l′′(θ)- 所以:
- 当中:l′(θ) =
应用:
- 特征值比較少的情况,否则H−1的计算量是非常大的
Logistic 0、1分类:
1.自己设定迭代次数
自己编写对应的循环,给出迭代次数以及下降坡度alpha,进行增量梯度下降。
主要函数及功能:
- Logistic_Regression 相当于主函数
- gradientDecent 梯度下降更新θ函数
- computeCost 计算损失J函数
Logistic_Regression
%% part0: 准备
data = load('ex2data1.txt');
x = data(:,[1,2]);
y = data(:,3);
pos = find(y==1);
neg = find(y==0);
x1 = x(:,1);
x2 = x(:,2);
plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');
pause;
%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,1);
J = computeCost(x,y,theta);
theta = gradientDecent(x, y, theta);
X = 25:100;
Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);
plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');
pause;
gradientDecent
function theta = gradientDecent(x, y, theta)
%% compute GradientDecent 更新theta,利用的是增量梯度下降
m = size(x,1);
alph = 0.001;
for iter = 1:150000
for j = 1:3
dec = 0;
for i = 1:m
dec = dec + (y(i) - sigmoid(x(i,:)*theta))*x(i,j);
end
theta(j,1) = theta(j,1) + dec*alph/m;
end
end
end
sigmoid
function g = sigmoid(z)
%% SIGMOID Compute sigmoid functoon
g = 1/(1+exp(-z));
end
computeCost
function J = computeCost(x, y, theta)
%% compute cost: J
m = size(x,1);
J = 0;
for i = 1:m
J = J + y(i)*log(sigmoid(x(i,:)*theta)) + (1 - y(i))*log(1 - sigmoid(x(i,:)*theta));
end
J = (-1/m)*J;
end
结果例如以下:
2. 利用fminunc函数:
给出损失J的计算方式和θ的计算方式。然后调用fminunc函数计算出最优解
主要函数及功能:
- Logistics_Regression 相当于主函数
- computeCost给出J和θ的计算方式
- sigmoid函数
Logistics_Regression
%% part0: 准备
data = load('ex2data1.txt');
x = data(:,[1,2]);
y = data(:,3);
pos = find(y==1);
neg = find(y==0);
x1 = x(:,1);
x2 = x(:,2);
plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');
pause;
%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,1);
options = optimset('GradObj', 'on', 'MaxIter', 400);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[theta, cost] = ...
fminunc(@(t)(computeCost(x,y,t)), theta, options);
X = 25:100;
Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);
plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');
pause;
sigmoid
function g = sigmoid(z)
%% SIGMOID Compute sigmoid functoon
g = zeros(size(z));
g = 1.0 ./ (1.0 + exp(-z));
end
computeCost
function [J,grad] = computeCost(x, y, theta)
%% compute cost: J
m = size(x,1);
grad = zeros(size(theta));
hx = sigmoid(x * theta);
J = (1.0/m) * sum(-y .* log(hx) - (1.0 - y) .* log(1.0 - hx));
grad = (1.0/m) .* x' * (hx - y);
end
结果
Logistic multi_class
1.条件
- 自己做的数据:
1,5,1
1,6,1
1.5,3.5,1
2.5,3.5,1
2,6,1
3,7,1
4,6,1
3.5,4.5,1
2,4,1
2,5,1
4,4,1
5,5,1
6,4,1
5,3,1
4,2,1
4,3,2
5,3,2
5,2,2
5,1.5,2
7,1.5,2
5,2.5,2
6,2.5,2
5.5,2.5,2
5,1,2
6,2,2
6,3,2
5,4,2
7,5,2
7,2,2
8,1,2
8,3,2
7,4,3
7,5,3
8.5,5.5,3
9,4,3
8,5.5,3
8,4.5,3
9.5,5.5,3
8,4.5,3
8.5,4.5,3
7,6,3
6,5,3
9,5,3
9,6,3
8,6,3
8,7,3
10,6,3
10,4,3
数据离散图:
2.算法推到
花费J :
更新θ:
算法思路(这个算法也叫one_vs_all):
假设样本分成K类,。那我们训练K组θ,依次考虑每一类样本,然后把其他的全部样本当做一类样本,这样就把这类样本和其他分开了。我们把考虑的那类样本的y值改为1,其他为0。这样就得到K组θ值。
3.代码实现:
这里採用fminuc函数实现
1.函数级功能简单介绍:
- Logistic_Regression : 相当于主函数
- oneVsAll: 写成一个循环,依次计算出K组θ。利用fminunc调用计算函数
- computeCost:当中主要写J&θ更新函数
2.代码:
- Logistic_Regerssion:
%% part0: 准备
data = load('data.txt');
x = data(:,[1,2]);
y = data(:,3);
y1 = find(y==1);
y2 = find(y==2);
y3 = find(y==3);
plot(x(y1,1),x(y1,2),'r*',x(y2,1),x(y2,2),'c+',x(y3,1),x(y3,2),'bo');
pause;
%% part1: GradientDecent and compute cost of J
[m,n] = size(x);
x = [ones(m,1),x];
theta = zeros(3,3);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[thetas,cost]= one_vs_all(x,y,theta);
X = 1:10;
Y1 = -(thetas(1,1) + thetas(2,1)*X)/thetas(3,1);
Y2 = -(thetas(1,2) + thetas(2,2)*X)/thetas(3,2);
Y3 = -(thetas(1,3) + thetas(2,3)*X)/thetas(3,3);
plot(x(y1,2),x(y1,3),'r*',x(y2,2),x(y2,3),'c+',x(y3,2),x(y3,3),'bo');
hold on
plot(X,Y1,'r',X,Y2,'g',X,Y3,'c');
- one_vs_all:
function [theta,cost] = one_vs_all(x, y, theta)
%% compute cost: J
options = optimset('GradObj', 'on', 'MaxIter', 400);
n = size(x,2);
cost = zeros(n,1);
num_labels = 3;
for i = 1:num_labels
L = logical(y==i);
[theta(:,i), cost(i,1)] = ...
fminunc(@(t)(computeCost(x,L,t)), theta(:,i), options);
end
- computeCost:
function [J,grad] = computeCost(x, y, thetas)
%% compute cost: J
m = size(x,1);
grad = zeros(size(thetas));
hx = sigmoid(x * thetas);
J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));
grad = (1.0/m) .* x' * (hx - y);
end
3.效果:
- θ & J cost:
thetas =
6.3988 5.1407 -24.4266
-2.0773 0.2173 2.1641
0.9857 -1.9490 2.2038
>> cost
cost =
0.1715
0.2876
0.1031
图形显示:
注意三条线组成的三角形。。这个地方的点不属于不论什么类别。
补充:
1.regularized Logistic Regerssion
- regularized 和 普通的Logistics没有太大的差别,仅仅是在J的计算和θ更新中加上了曾经的结果。
2.one_vs_all:
1.简单介绍:
事实上one_vs_all另一种算法,把θ当做单隐层前馈神经网络进行计算。比方说我们有K类样本,第一类样本我们能够看成[1,0,0,0...]共k个数,,然后依次。,第i个为1则代表第i类样本。计算方式和上面multi_class一样。
前馈神经网络模型例如以下:
2.代码:
函数介绍:
- one_vs_all:相当于主函数。
- IrCostFunction:花费J和θ更新
- myPredict:统计训练误差
数据 和 训练得到的θ:
点击这儿下载训练结果:
Local minimum found.
Optimization completed because the size of the gradient is less than
the default value of the function tolerance.
<stopping criteria details>
Local minimum found.
Optimization completed because the size of the gradient is less than
the default value of the function tolerance.
<stopping criteria details>
Training Set Accuracy: 100.000000
- one_vs_all:
function [all_theta,cost] = oneVsAll(X, y, num_labels)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta
%corresponds to the classifier for label i
% [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
% logisitc regression classifiers and returns each of these classifiers
% in a matrix all_theta, where the i-th row of all_theta corresponds
% to the classifier for label i
% Some useful variables
m = size(X, 1);
n = size(X, 2);
% You need to return the following variables correctly
all_theta = zeros(n+1,num_labels);
% Add ones to the X data matrix
X = [ones(m, 1),X];
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
% logistic regression classifiers with regularization
% parameter lambda.
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell use
% whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
% function. It is okay to use a for-loop (for c = 1:num_labels) to
% loop over the different classes.
%
% fmincg works similarly to fminunc, but is more efficient when we
% are dealing with large number of parameters.
%
% Example Code for fmincg:
%
% % Set Initial theta
% initial_theta = zeros(n + 1, 1);
%
% % Set options for fminunc
% options = optimset('GradObj', 'on', 'MaxIter', 50);
%
% % Run fmincg to obtain the optimal theta
% % This function will return theta and the cost
% [theta] = ...
% fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
% initial_theta, options);
%
cost = zeros(num_labels,1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
for i =1:num_labels
L = logical(y==i);
[all_theta(:,i),cost(i,1)] = ...
fminunc (@(t)(lrCostFunction(t, X, L)),all_theta(:,i), options);
end
myPredict(all_theta,X,y);
% =========================================================================
end
- IrCostFunction:
function [J,grad] = lrCostFunction(thetas,x, y)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
%regularization
% J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
%单独调试该函数时用的代码
%x = [ones(m,1),x];
%theta = zeros(size(x,2),1);
%y = logical(y==1);
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
% efficiently vectorized. For example, consider the computation
%
% sigmoid(X * theta)
%
% Each row of the resulting matrix will contain the value of the
% prediction for that example. You can make use of this to vectorize
% the cost function and gradient computations.
%
% Hint: When computing the gradient of the regularized cost function,
% there're many possible vectorized solutions, but one solution
% looks like:
% grad = (unregularized gradient for logistic regression)
% temp = theta;
% temp(1) = 0; % because we don't add anything for j = 0
% grad = grad + YOUR_CODE_HERE (using the temp variable)
%
grad = zeros(size(thetas));
hx = sigmoid(x * thetas);
J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));
grad = (1.0/m) .* x' * (hx - y);
% ================================================x=============
end
- myPredict:
function p = myPredict(Theta1,X,y)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)
% Useful values
m = size(X, 1);
num_labels = 10;
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned neural network. You should set p to a
% vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
% function can also return the index of the max element, for more
% information see 'help max'. If your examples are in rows, then, you
% can use max(A, [], 2) to obtain the max for each row.
%
z_2 = X*Theta1;
a_2 = sigmoid(z_2);
for i = 1:m
for j = 1:num_labels
if a_2(i,j) >= 0.5
p(i,1) = j;
break;
end
end
end
fprintf('\nTraining Set Accuracy: %f\n', mean(double(p == y)) * 100);
% =========================================================================
end
与本博客相关知识链接:
Classification and logistic regression的更多相关文章
- 李宏毅机器学习笔记3:Classification、Logistic Regression
李宏毅老师的机器学习课程和吴恩达老师的机器学习课程都是都是ML和DL非常好的入门资料,在YouTube.网易云课堂.B站都能观看到相应的课程视频,接下来这一系列的博客我都将记录老师上课的笔记以及自己对 ...
- Classification week2: logistic regression classifier 笔记
华盛顿大学 machine learning: Classification 笔记. linear classifier 线性分类器 多项式: Logistic regression & 概率 ...
- 分类和逻辑回归(Classification and logistic regression)
分类问题和线性回归问题问题很像,只是在分类问题中,我们预测的y值包含在一个小的离散数据集里.首先,认识一下二元分类(binary classification),在二元分类中,y的取值只能是0和1.例 ...
- 机器学习算法笔记1_2:分类和逻辑回归(Classification and Logistic regression)
形式: 採用sigmoid函数: g(z)=11+e−z 其导数为g′(z)=(1−g(z))g(z) 如果: 即: 若有m个样本,则似然函数形式是: 对数形式: 採用梯度上升法求其最大值 求导: 更 ...
- Logistic Regression Using Gradient Descent -- Binary Classification 代码实现
1. 原理 Cost function Theta 2. Python # -*- coding:utf8 -*- import numpy as np import matplotlib.pyplo ...
- 机器学习技法:05 Kernel Logistic Regression
Roadmap Soft-Margin SVM as Regularized Model SVM versus Logistic Regression SVM for Soft Binary Clas ...
- 机器学习技法笔记:05 Kernel Logistic Regression
Roadmap Soft-Margin SVM as Regularized Model SVM versus Logistic Regression SVM for Soft Binary Clas ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 3) Logistic Regression & Regularization
coursera上面Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml 我曾经使用Logistic Regressio ...
- logistic regression model
logistic regression model LR softmax classification Fly logistic regression model loss fuction softm ...
随机推荐
- 快照、克隆,xshell优化,Linux历史
目录 一.虚拟拍照功能 二.虚拟机克隆功能 三.Xshell的优化 四.介绍Linux历史 一.虚拟拍照功能 1.拍摄快照 关机状态拍照 关机命令:shutdown -h now 或者 init 0 ...
- DSP中-stack和-heap的作用
-stack 0x00000800-heap 0x00000800 stack - 又称系统栈(system stack),用于: 保存函数调用后的返回地址; ...
- ORACLE常用修改字段脚本
describe employees; = select column_name,data_type,nullable,data_length,data_ precision,data_scale f ...
- 学习Gulp过程中遇到的一些单词含义
注:以下有的单词的含义不仅仅在gulp里面是一样的,在其他某些语言里面也是一样 nodejs Doc:https://nodejs.org/api/stream.html gulp Api:http: ...
- 如何用Jquery做图片展示效果
一. 前言 到底用JQuery做出怎样的展示效果? 让我们先来看一下!网页加载时,如图所示: 二.本人思路 这个效果初学者看起来好像有点复杂,其实不太难,关键是理清思路,从后端的数据库中找出我们要展示 ...
- hdu2042
#include <stdio.h> int main(){ int t,i,n,res; while(~scanf("%d",&t)){ while(t--) ...
- 【EF 1】EF实体框架 原理+实例
一.知识回顾 到目前为止,自己学到的链接数据库操作已经经历了几个阶段,分别是:学生信息管理和(第一次)机房收费时的直接连接数据库操作表格,然后是机房个人重构中应用的操作实体,在其中还利用了一个很重要的 ...
- iOS长按控件
前言 网上看到一个button的长按控件效果不错,一个菱形从中间向两边增大,研究了下 原理 上图红色是控件上面放了视图,从上到下分别是view,normalLable,highlightLabel,b ...
- BZOJ3926 [Zjoi2015]诸神眷顾的幻想乡 【广义后缀自动机】
题目 幽香是全幻想乡里最受人欢迎的萌妹子,这天,是幽香的2600岁生日,无数幽香的粉丝到了幽香家门前的太阳花田上来为幽香庆祝生日. 粉丝们非常热情,自发组织表演了一系列节目给幽香看.幽香当然也非常高兴 ...
- 【2018.10.15】WZJ笔记(数论)
1. 证明:对于任意质数$p\gt 3$,$p^2-1$能被$24$整除. 证:平方差公式,$p^2-1 = (p-1)(p+1)$. 再把$24$分解质因数$2^3*3$. 三个相邻的自然数中至少有 ...