predict.glm -> which class does it predict?
Jul 10, 2009; 10:46pm
predict.glm -> which class does it predict?
|
2 posts
|
Hi,
I have a question about logistic regression in R. Suppose I have a small list of proteins P1, P2, P3 that predict a model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of This works fine. T is a factored vector with levels cancer, noncancer. Now, I want to use predict.glm to predict a new data. predict(model, newdata=testsamples, type="response") (testsamples is The result is a vector of the probabilites for each sample in Is this fallowing expression Thank you, Peter ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
1330 posts
|
On Jul 10, 2009, at 9:46 AM, Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know > that I can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the > dataset of the Proteins). > > This works fine. T is a factored vector with levels cancer, > noncancer. Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples > is a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level > in T? To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? > > Thank you, > > Peter ... [show rest of quote]
As per the Details section of ?glm: A typical predictor has the form response ~ terms where response is So, given your description above, you are predicting If you want to predict "cancer", alter the factor levels thusly: T <- factor(T, levels = c("noncancer", "cancer")) By default, R will alpha sort the factor levels, so "cancer" would be Think of it in terms of using a 0,1 integer code for absence,presence, BTW, using 'T' as the name of the response vector is not a good habit: > T 'T' is shorthand for the built in R constant TRUE. R is generally HTH, Marc Schwartz ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
2360 posts
|
In reply to this post by Peter Schüffler-2
Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know that I > can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of > the Proteins). > > This works fine. T is a factored vector with levels cancer, noncancer. > Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples is > a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level in T? > To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? ... [show rest of quote]
It's the probability of the 2nd level of a factor response (termed I find it easiest to sort ut this kind of issue by experimentation in > x <- sample(c("A","B"),10,replace=TRUE) (notice that the relative frequency of B is 0.6) > glm(x~1,binomial) (OK, so it won't go without conversion to factor. This is a good thing.) > glm(factor(x)~1,binomial) Call: glm(formula = factor(x) ~ 1, family = binomial) Coefficients: Degrees of Freedom: 9 Total (i.e. Null); 9 Residual (The intercept is positive, corresponding to log odds for a probability > predict(glm(factor(x)~1,binomial)) As for why it's not the other way around, well, if it had been, then you -- ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
7686 posts
|
2009/7/10 Peter Dalgaard <[hidden email]>:
> Peter Schüffler wrote:
>> >> Hi, >> >> I have a question about logistic regression in R. >> >> Suppose I have a small list of proteins P1, P2, P3 that predict a >> two-class target T, say cancer/noncancer. Lets further say I know that I can >> build a simple logistic regression model in R >> >> model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of >> the Proteins). >> >> This works fine. T is a factored vector with levels cancer, noncancer. >> Proteins are numeric. >> >> Now, I want to use predict.glm to predict a new data. >> >> predict(model, newdata=testsamples, type="response") (testsamples is a >> small set of new samples). >> >> The result is a vector of the probabilites for each sample in testsamples. >> But probabilty WHAT for? To belong to the first level in T? To belong to >> second level in T? >> >> Is this fallowing expression >> factor(predict(model, newdata=testsamples, type="response") >= 0.5) >> TRUE, when the new sample is classified to Cancer or when it's classified >> to Noncancer? And why not the other way around? > > It's the probability of the 2nd level of a factor response (termed "success" > in the documentation, even when your modeling the probability of disease or > death...), just like when interpreting the logistic regression itself. > > I find it easiest to sort ut this kind of issue by experimentation in > simplified situations. E.g. > >> x <- sample(c("A","B"),10,replace=TRUE) >> x > [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" >> table(x) > x > A B > 4 6 > > (notice that the relative frequency of B is 0.6) > >> glm(x~1,binomial) > Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 > In addition: Warning message: > In model.matrix.default(mt, mf, contrasts) : > variable 'x' converted to a factor > > (OK, so it won't go without conversion to factor. This is a good thing.) > >> glm(factor(x)~1,binomial) > > Call: glm(formula = factor(x) ~ 1, family = binomial) > > Coefficients: > (Intercept) > 0.4055 > > Degrees of Freedom: 9 Total (i.e. Null); 9 Residual > Null Deviance: 13.46 > Residual Deviance: 13.46 AIC: 15.46 > > (The intercept is positive, corresponding to log odds for a probability > > 0.5 ; i.e., must be that "B": 0.4055==log(6/4)) > >> predict(glm(factor(x)~1,binomial)) > 1 2 3 4 5 6 7 8 > 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 > 0.4054651 > 9 10 > 0.4054651 0.4054651 >> predict(glm(factor(x)~1,binomial),type="response") > 1 2 3 4 5 6 7 8 9 10 > 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 > > As for why it's not the other way around, well, if it had been, then you > could have asked the same question.... > ... [show rest of quote]
Or more specifically: > resp <- factor(c("cancer", "noncancer", "noncancer", "noncancer")) and since noncancer occurs 75% of the time in the sample clearly ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
2360 posts
|
In reply to this post by Peter Dalgaard
> As for why it's not the other way around, well, if it had been, then you
> could have asked the same question.... ...and come to think about it, it is rather convenient that it meshes -- ______________________________________________ |
predict.glm -> which class does it predict?的更多相关文章
- CF451C Predict Outcome of the Game 水题
Codeforces Round #258 (Div. 2) Predict Outcome of the Game C. Predict Outcome of the Game time limit ...
- tflearn tensorflow LSTM predict sin function
from __future__ import division, print_function, absolute_import import tflearn import numpy as np i ...
- 如何在R语言中使用Logistic回归模型
在日常学习或工作中经常会使用线性回归模型对某一事物进行预测,例如预测房价.身高.GDP.学生成绩等,发现这些被预测的变量都属于连续型变量.然而有些情况下,被预测变量可能是二元变量,即成功或失败.流失或 ...
- 简单介绍一下R中的几种统计分布及常用模型
统计学上分布有很多,在R中基本都有描述.因能力有限,我们就挑选几个常用的.比较重要的简单介绍一下每种分布的定义,公式,以及在R中的展示. 统计分布每一种分布有四个函数:d――density(密度函数) ...
- Machine Learning for hackers读书笔记(六)正则化:文本回归
data<-'F:\\learning\\ML_for_Hackers\\ML_for_Hackers-master\\06-Regularization\\data\\' ranks < ...
- 统计学习导论:基于R应用——第五章习题
第五章习题 1. 我们主要用到下面三个公式: 根据上述公式,我们将式子化简为 对求导即可得到得到公式5-6. 2. (a) 1 - 1/n (b) 自助法是有有放回的,所以第二个的概率还是1 - 1/ ...
- 统计学习导论:基于R应用——第四章习题
第四章习题,部分题目未给出答案 1. 这个题比较简单,有高中生推导水平的应该不难. 2~3证明题,略 4. (a) 这个问题问我略困惑,答案怎么直接写出来了,难道不是10%么 (b) 这个答案是(0. ...
- R与数据分析旧笔记(⑨)广义线性回归模型
广义线性回归模型 广义线性回归模型 例题1 R.Norell实验 为研究高压电线对牲畜的影响,R.Norell研究小的电流对农场动物的影响.他在实验中,选择了7头,6种电击强度, 0,1,2,3,4, ...
- logistic回归和probit回归预测公司被ST的概率(应用)
1.适合阅读人群: 知道以下知识点:盒状图.假设检验.逻辑回归的理论.probit的理论.看过回归分析,了解AIC和BIC判别准则.能自己跑R语言程序 2.本文目的:用R语言演示一个相对完整的逻辑回归 ...
随机推荐
- VuePress从零开始搭建自己的博客
VuePress是什么? VuePress是以Vue驱动的静态网站生成器,是一个由Vue.Vue Router和webpack驱动的单页应用.在VuePress中,你可以使用Markdown编写文档, ...
- 【Docker】Segmentation Fault or Critical Error encountered. Dumping core and abort
背景 CentOS7 安装Docker后,load镜像时出现以下错误: Segmentation Fault or Critical Error encountered. Dumping core a ...
- CSS 列表样式详解
CSS列表用于前端的列表排列. CSS列表属性作用如下: 设置不同的列表项标记为有序列表 设置不同的列表项标记为无序列表 设置列表项标记为图像 列表 在HTML中,有两种类型的列表: 无序列表 - 列 ...
- CentOS下安装Python-pip
1.安装epel-release软件包:自动配置yum的软件仓库,弥补centos内容更新有时比较滞后或是一些扩展的源没有. yum -y install epel-release 2.安装pytho ...
- 彻底弄懂jQuery事件原理一
jQuery为我们提供了一个非常丰富好用的事件API,相对于浏览器自身的事件接口,jQuery有以下特点: 1. 对浏览器进行了兼容性处理,用户使用不需要考虑浏览器兼容性问题 2. 事件数据是保持在内 ...
- SharePoint Development - Custom Field using Visual Studio 2010 based SharePoint 2010
博客地址 http://blog.csdn.net/foxdave 自定义列表的时候有时候需要自定义一些字段来更好地实现列表的功能,本文讲述自定义字段的一般步骤 打开Visual Studio,我们还 ...
- MyEclipse10 中设置Jquery提醒,亲测可用
最近做练习需要用到Jquery,在myeclipse中默认没有提示功能.然后在网上找解决方案,有一种方案说使用spket,然后搜索安装,折腾了半天还是不行,脑细胞死掉几百个.. 然后在网上搜到另外一种 ...
- vue.js 源代码学习笔记 ----- html-parse.js
/** * Not type-checking this file because it's mostly vendor code. */ /*! * HTML Parser By John Resi ...
- 选择语句=》OO函数实现
let a; let b; if (a==="A") { b='定向' }else if (a==='B') { b='开放' }else if(a==='C') { b='全部' ...
- 整理关于Java进行word文档的数据动态数据填充
首先我们看下,别人整理的关于Java生成doc 的 资料. java生成word的几种方案 1. Jacob是Java-COM Bridge的缩写,它在Java与微软的COM组件之间构建一座桥梁.使用 ...
