Some 3D Graphics (rgl) for Classification with Splines and Logistic Regression (from The Elements of Statistical Learning)(转)
This semester I'm teaching from Hastie, Tibshirani, and Friedman's book, The Elements of Statistical Learning, 2nd Edition. The authors provide aMixture Simulation data set that has two continuous predictors and a binary outcome. This data is used to demonstrate classification procedures by plotting classification boundaries in the two predictors. For example, the figure below is a reproduction of Figure 2.5 in the book:

The solid line represents the Bayes decision boundary (i.e., {x: Pr("orange"|x) = 0.5}), which is computed from the model used to simulate these data. The Bayes decision boundary and other boundaries are determined by one or more surfaces (e.g., Pr("orange"|x)), which are generally omitted from the graphics. In class, we decided to use the R package rgl to create a 3D representation of this surface. Below is the code and graphic (well, a 2D projection) associated with the Bayes decision boundary:
library(rgl)
load(url("http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/ESL.mixture.rda"))
dat <- ESL.mixture ## create 3D graphic, rotate to view 2D x1/x2 projection
par3d(FOV=1,userMatrix=diag(4))
plot3d(dat$xnew[,1], dat$xnew[,2], dat$prob, type="n",
xlab="x1", ylab="x2", zlab="",
axes=FALSE, box=TRUE, aspect=1) ## plot points and bounding box
x1r <- range(dat$px1)
x2r <- range(dat$px2)
pts <- plot3d(dat$x[,1], dat$x[,2], 1,
type="p", radius=0.5, add=TRUE,
col=ifelse(dat$y, "orange", "blue"))
lns <- lines3d(x1r[c(1,2,2,1,1)], x2r[c(1,1,2,2,1)], 1) ## draw Bayes (True) decision boundary; provided by authors
dat$probm <- with(dat, matrix(prob, length(px1), length(px2)))
dat$cls <- with(dat, contourLines(px1, px2, probm, levels=0.5))
pls <- lapply(dat$cls, function(p) lines3d(p$x, p$y, z=1)) ## plot marginal (w.r.t mixture) probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, dat$prob, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)

In the above graphic, the probability surface is represented in gray, and the Bayes decision boundary occurs where the plane f(x) = 0.5 (in light gray) intersects with the probability surface.
Of course, the classification task is to estimate a decision boundary given the data. Chapter 5 presents two multidimensional splines approaches, in conjunction with binary logistic regression, to estimate a decision boundary. The upper panel of Figure 5.11 in the book shows the decision boundary associated with additive natural cubic splines in x1 and x2 (4 df in each direction; 1+(4-1)+(4-1) = 7 parameters), and the lower panel shows the corresponding tensor product splines (4x4 = 16 parameters), which are much more flexible, of course. The code and graphics below reproduce the decision boundaries shown in Figure 5.11, and additionally illustrate the estimated probability surface (note: this code below should only be executed after the above code, since the 3D graphic is modified, rather than created anew):
Reproducing Figure 5.11 (top):
## clear the surface, decision plane, and decision boundary
par3d(userMatrix=diag(4)); pop3d(id=sfc); pop3d(id=qds)
for(pl in pls) pop3d(id=pl) ## fit additive natural cubic spline model
library(splines)
ddat <- data.frame(y=dat$y, x1=dat$x[,1], x2=dat$x[,2])
form.add <- y ~ ns(x1, df=3)+
ns(x2, df=3)
fit.add <- glm(form.add, data=ddat, family=binomial(link="logit")) ## compute probabilities, plot classification boundary
probs.add <- predict(fit.add, type="response",
newdata = data.frame(x1=dat$xnew[,1], x2=dat$xnew[,2]))
dat$probm.add <- with(dat, matrix(probs.add, length(px1), length(px2)))
dat$cls.add <- with(dat, contourLines(px1, px2, probm.add, levels=0.5))
pls <- lapply(dat$cls.add, function(p) lines3d(p$x, p$y, z=1)) ## plot probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, probs.add, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)

Reproducing Figure 5.11 (bottom)
## clear the surface, decision plane, and decision boundary
par3d(userMatrix=diag(4)); pop3d(id=sfc); pop3d(id=qds)
for(pl in pls) pop3d(id=pl) ## fit tensor product natural cubic spline model
form.tpr <- y ~ 0 + ns(x1, df=4, intercept=TRUE):
ns(x2, df=4, intercept=TRUE)
fit.tpr <- glm(form.tpr, data=ddat, family=binomial(link="logit")) ## compute probabilities, plot classification boundary
probs.tpr <- predict(fit.tpr, type="response",
newdata = data.frame(x1=dat$xnew[,1], x2=dat$xnew[,2]))
dat$probm.tpr <- with(dat, matrix(probs.tpr, length(px1), length(px2)))
dat$cls.tpr <- with(dat, contourLines(px1, px2, probm.tpr, levels=0.5))
pls <- lapply(dat$cls.tpr, function(p) lines3d(p$x, p$y, z=1)) ## plot probability surface and decision plane
sfc <- surface3d(dat$px1, dat$px2, probs.tpr, alpha=1.0,
color="gray", specular="gray")
qds <- quads3d(x1r[c(1,2,2,1)], x2r[c(1,1,2,2)], 0.5, alpha=0.4,
color="gray", lit=FALSE)

Although the graphics above are static, it is possible to embed an interactive 3D version within a web page (e.g., see the rgl vignette; best with Google Chrome), using the rgl function writeWebGL. I gave up on trying to embed such a graphic into this WordPress blog post, but I have created a separate page for the interactive 3D version of Figure 5.11b. Duncan Murdoch's work with this package is reall nice!
This entry was posted in Technical and tagged data, graphics, programming, R, statistics on February 1, 2015.
转自:http://biostatmatt.com/archives/2659
Some 3D Graphics (rgl) for Classification with Splines and Logistic Regression (from The Elements of Statistical Learning)(转)的更多相关文章
- More 3D Graphics (rgl) for Classification with Local Logistic Regression and Kernel Density Estimates (from The Elements of Statistical Learning)(转)
This post builds on a previous post, but can be read and understood independently. As part of my cou ...
- 机器学习理论基础学习3.3--- Linear classification 线性分类之logistic regression(基于经验风险最小化)
一.逻辑回归是什么? 1.逻辑回归 逻辑回归假设数据服从伯努利分布,通过极大化似然函数的方法,运用梯度下降来求解参数,来达到将数据二分类的目的. logistic回归也称为逻辑回归,与线性回归这样输出 ...
- 李宏毅机器学习笔记3:Classification、Logistic Regression
李宏毅老师的机器学习课程和吴恩达老师的机器学习课程都是都是ML和DL非常好的入门资料,在YouTube.网易云课堂.B站都能观看到相应的课程视频,接下来这一系列的博客我都将记录老师上课的笔记以及自己对 ...
- Logistic Regression Using Gradient Descent -- Binary Classification 代码实现
1. 原理 Cost function Theta 2. Python # -*- coding:utf8 -*- import numpy as np import matplotlib.pyplo ...
- Classification week2: logistic regression classifier 笔记
华盛顿大学 machine learning: Classification 笔记. linear classifier 线性分类器 多项式: Logistic regression & 概率 ...
- Android Programming 3D Graphics with OpenGL ES (Including Nehe's Port)
https://www3.ntu.edu.sg/home/ehchua/programming/android/Android_3D.html
- Logistic Regression and Classification
分类(Classification)与回归都属于监督学习,两者的唯一区别在于,前者要预测的输出变量\(y\)只能取离散值,而后者的输出变量是连续的.这些离散的输出变量在分类问题中通常称之为标签(Lab ...
- Logistic Regression求解classification问题
classification问题和regression问题类似,区别在于y值是一个离散值,例如binary classification,y值只取0或1. 方法来自Andrew Ng的Machine ...
- 分类和逻辑回归(Classification and logistic regression)
分类问题和线性回归问题问题很像,只是在分类问题中,我们预测的y值包含在一个小的离散数据集里.首先,认识一下二元分类(binary classification),在二元分类中,y的取值只能是0和1.例 ...
随机推荐
- 03 Types of Learning
学习的分类: 根据输出空间Y:分类(二分类.多分类).回归.结构化(监督学习+输出空间有结构): 根据标签y:监督学习.无监督学习(聚类.密度估计.异常点检测).半监督学习(标注成本高时).强化学习: ...
- jmeter 登录并发 (此文章有待修改)
1.先通过录制通过取样器找到所需要的请求.并新建添加至线程组,也可以根据以下样式找到所需请求.复制添加至线程组 寻找请求 添加后 2.添加CSV配置元件 3.填写CSV参数 4.修改参数.这是格式:& ...
- (读书笔记)函数参数浅析-JavaScript高级程序设计(第3版)
ECMAScript函数不介意传递的参数个数,因为在其内部是用一个数组进行表示的.在函数体内可以通过arguments对象来访问这个参数数组,就像我们正常访问数组一样处理. arguments对象只是 ...
- MySQL关于check约束无效的解决办法
首先看下面这段MySQL的操作,我新建了一个含有a和b的表,其中a用check约束必须大于0,然而我插入了一条(-2,1,1)的数据,其中a=-2,也是成功插入的. 所以MySQL只是check,但是 ...
- PHP7中我们应该学习会用的新特性
PHP7于2015年11月正式发布,本次更新可谓是PHP的重要里程碑,它将带来显著的性能改进和新特性,并对之前版本的一些特性进行改进.本文小编将和大家一起来了解探讨PHP7中的新特性. 1. 标量类型 ...
- 设计模式(1)单例模式(Singleton)
设计模式(0)简单工厂模式 源码地址 0 单例模式简介 0.0 单例模式定义 单例模式是GOF二十三中经典设计模式的简单常用的一种设计模式,单例模式的基本结构需满足以下要求. 单例模式的核心结构只有一 ...
- javascript——数据类型
在内存中,分为栈.堆.代码段.静态区,为了快速处理复杂的代码,在不同的区间储存不同的数据类型. 数据类型分为初始类型与引用类型,初始类型在栈中存储,变量赋值传值不传址,引用类型在堆中存储,传址不传值. ...
- 频繁模式挖掘中Apriori、FP-Growth和Eclat算法的实现和对比
最近上数据挖掘的课程,其中学习到了频繁模式挖掘这一章,这章介绍了三种算法,Apriori.FP-Growth和Eclat算法:由于对于不同的数据来说,这三种算法的表现不同,所以我们本次就对这三种算法在 ...
- 开发一款直播APP系统软件应该有哪些功能,如何开发?
1.技术实现层面: 技术相对都比较成熟,设备也都支持硬编码.IOS还提供现成的 Video ToolBox框架,可以对摄像头和流媒体数据结构进行处理,但Video ToolBox框架只兼容8.0以上版 ...
- 用PetaPoco为ASP.NET已有数据库建模
序:最近一直在抓紧重构公司的网站,没有很多时间去写博客,积累了很多的问题,几乎是一天一个,折腾死了,尤其是在模型方面几经周折. 以前,多半从事PHP开发,很少接触到模型(thinkphp中模型),但是 ...