predict.glm -> which class does it predict?
Jul 10, 2009; 10:46pm
predict.glm -> which class does it predict?
|
2 posts
|
Hi,
I have a question about logistic regression in R. Suppose I have a small list of proteins P1, P2, P3 that predict a model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of This works fine. T is a factored vector with levels cancer, noncancer. Now, I want to use predict.glm to predict a new data. predict(model, newdata=testsamples, type="response") (testsamples is The result is a vector of the probabilites for each sample in Is this fallowing expression Thank you, Peter ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
1330 posts
|
On Jul 10, 2009, at 9:46 AM, Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know > that I can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the > dataset of the Proteins). > > This works fine. T is a factored vector with levels cancer, > noncancer. Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples > is a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level > in T? To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? > > Thank you, > > Peter ... [show rest of quote]
As per the Details section of ?glm: A typical predictor has the form response ~ terms where response is So, given your description above, you are predicting If you want to predict "cancer", alter the factor levels thusly: T <- factor(T, levels = c("noncancer", "cancer")) By default, R will alpha sort the factor levels, so "cancer" would be Think of it in terms of using a 0,1 integer code for absence,presence, BTW, using 'T' as the name of the response vector is not a good habit: > T 'T' is shorthand for the built in R constant TRUE. R is generally HTH, Marc Schwartz ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
2360 posts
|
In reply to this post by Peter Schüffler-2
Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know that I > can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of > the Proteins). > > This works fine. T is a factored vector with levels cancer, noncancer. > Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples is > a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level in T? > To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? ... [show rest of quote]
It's the probability of the 2nd level of a factor response (termed I find it easiest to sort ut this kind of issue by experimentation in > x <- sample(c("A","B"),10,replace=TRUE) (notice that the relative frequency of B is 0.6) > glm(x~1,binomial) (OK, so it won't go without conversion to factor. This is a good thing.) > glm(factor(x)~1,binomial) Call: glm(formula = factor(x) ~ 1, family = binomial) Coefficients: Degrees of Freedom: 9 Total (i.e. Null); 9 Residual (The intercept is positive, corresponding to log odds for a probability > predict(glm(factor(x)~1,binomial)) As for why it's not the other way around, well, if it had been, then you -- ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
7686 posts
|
2009/7/10 Peter Dalgaard <[hidden email]>:
> Peter Schüffler wrote:
>> >> Hi, >> >> I have a question about logistic regression in R. >> >> Suppose I have a small list of proteins P1, P2, P3 that predict a >> two-class target T, say cancer/noncancer. Lets further say I know that I can >> build a simple logistic regression model in R >> >> model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of >> the Proteins). >> >> This works fine. T is a factored vector with levels cancer, noncancer. >> Proteins are numeric. >> >> Now, I want to use predict.glm to predict a new data. >> >> predict(model, newdata=testsamples, type="response") (testsamples is a >> small set of new samples). >> >> The result is a vector of the probabilites for each sample in testsamples. >> But probabilty WHAT for? To belong to the first level in T? To belong to >> second level in T? >> >> Is this fallowing expression >> factor(predict(model, newdata=testsamples, type="response") >= 0.5) >> TRUE, when the new sample is classified to Cancer or when it's classified >> to Noncancer? And why not the other way around? > > It's the probability of the 2nd level of a factor response (termed "success" > in the documentation, even when your modeling the probability of disease or > death...), just like when interpreting the logistic regression itself. > > I find it easiest to sort ut this kind of issue by experimentation in > simplified situations. E.g. > >> x <- sample(c("A","B"),10,replace=TRUE) >> x > [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" >> table(x) > x > A B > 4 6 > > (notice that the relative frequency of B is 0.6) > >> glm(x~1,binomial) > Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 > In addition: Warning message: > In model.matrix.default(mt, mf, contrasts) : > variable 'x' converted to a factor > > (OK, so it won't go without conversion to factor. This is a good thing.) > >> glm(factor(x)~1,binomial) > > Call: glm(formula = factor(x) ~ 1, family = binomial) > > Coefficients: > (Intercept) > 0.4055 > > Degrees of Freedom: 9 Total (i.e. Null); 9 Residual > Null Deviance: 13.46 > Residual Deviance: 13.46 AIC: 15.46 > > (The intercept is positive, corresponding to log odds for a probability > > 0.5 ; i.e., must be that "B": 0.4055==log(6/4)) > >> predict(glm(factor(x)~1,binomial)) > 1 2 3 4 5 6 7 8 > 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 > 0.4054651 > 9 10 > 0.4054651 0.4054651 >> predict(glm(factor(x)~1,binomial),type="response") > 1 2 3 4 5 6 7 8 9 10 > 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 > > As for why it's not the other way around, well, if it had been, then you > could have asked the same question.... > ... [show rest of quote]
Or more specifically: > resp <- factor(c("cancer", "noncancer", "noncancer", "noncancer")) and since noncancer occurs 75% of the time in the sample clearly ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
2360 posts
|
In reply to this post by Peter Dalgaard
> As for why it's not the other way around, well, if it had been, then you
> could have asked the same question.... ...and come to think about it, it is rather convenient that it meshes -- ______________________________________________ |
predict.glm -> which class does it predict?的更多相关文章
- CF451C Predict Outcome of the Game 水题
Codeforces Round #258 (Div. 2) Predict Outcome of the Game C. Predict Outcome of the Game time limit ...
- tflearn tensorflow LSTM predict sin function
from __future__ import division, print_function, absolute_import import tflearn import numpy as np i ...
- 如何在R语言中使用Logistic回归模型
在日常学习或工作中经常会使用线性回归模型对某一事物进行预测,例如预测房价.身高.GDP.学生成绩等,发现这些被预测的变量都属于连续型变量.然而有些情况下,被预测变量可能是二元变量,即成功或失败.流失或 ...
- 简单介绍一下R中的几种统计分布及常用模型
统计学上分布有很多,在R中基本都有描述.因能力有限,我们就挑选几个常用的.比较重要的简单介绍一下每种分布的定义,公式,以及在R中的展示. 统计分布每一种分布有四个函数:d――density(密度函数) ...
- Machine Learning for hackers读书笔记(六)正则化:文本回归
data<-'F:\\learning\\ML_for_Hackers\\ML_for_Hackers-master\\06-Regularization\\data\\' ranks < ...
- 统计学习导论:基于R应用——第五章习题
第五章习题 1. 我们主要用到下面三个公式: 根据上述公式,我们将式子化简为 对求导即可得到得到公式5-6. 2. (a) 1 - 1/n (b) 自助法是有有放回的,所以第二个的概率还是1 - 1/ ...
- 统计学习导论:基于R应用——第四章习题
第四章习题,部分题目未给出答案 1. 这个题比较简单,有高中生推导水平的应该不难. 2~3证明题,略 4. (a) 这个问题问我略困惑,答案怎么直接写出来了,难道不是10%么 (b) 这个答案是(0. ...
- R与数据分析旧笔记(⑨)广义线性回归模型
广义线性回归模型 广义线性回归模型 例题1 R.Norell实验 为研究高压电线对牲畜的影响,R.Norell研究小的电流对农场动物的影响.他在实验中,选择了7头,6种电击强度, 0,1,2,3,4, ...
- logistic回归和probit回归预测公司被ST的概率(应用)
1.适合阅读人群: 知道以下知识点:盒状图.假设检验.逻辑回归的理论.probit的理论.看过回归分析,了解AIC和BIC判别准则.能自己跑R语言程序 2.本文目的:用R语言演示一个相对完整的逻辑回归 ...
随机推荐
- [原][译][osgearth]API加载地球(OE官方文档翻译)
原文参考:http://docs.osgearth.org/en/latest/developer/maps.html#programmatic-map-creation 本人翻译水平有限... 加载 ...
- Miller_Rabin(米勒拉宾)素数测试
2018-03-12 17:22:48 米勒-拉宾素性检验是一种素数判定法则,利用随机化算法判断一个数是合数还是可能是素数.卡内基梅隆大学的计算机系教授Gary Lee Miller首先提出了基于广义 ...
- hdu3031
题解: 左偏树模板题目 每一次合并,删除最大,修改最大 都是基本操作 代码: #include<cstdio> #include<cmath> #include<algo ...
- 记录下返回list给前端 遇到 $ref":"$.data.*** 问题
1.通过对象返回给前端,对象里面有三个list 2.一个父list 2个子list 子list中的对象 是通过for循环父list按照某个条件放进去的 3.直接放进去会出现 $ref":& ...
- python的单元测试代码编写流程
单元测试: 单元测试是对单独的代码块分别进行测试, 以确保它们的正确性, 单元测试主要还是由开发人员来做, 其余的集成测试和系统测试由专业的测试人员来做. python的单元测试代码编写主要记住以下几 ...
- L183 Chinese company unveils first satellite for free WiFi
A Chinese internet technology company unveiled the first satellite in a constellation plan to provid ...
- hasura graphql server (haskell)构建
安装 &&运行pg(docker) version: '3.6' services: postgres: image: postgres environment: - "PO ...
- return 0;和exit(0);的区别
首先说一下fork和vfork的差别: fork 是 创建一个子进程,并把父进程的内存数据copy到子进程中. vfork是 创建一个子进程,并和父进程的内存数据share一起用. 这两个的差别是,一 ...
- ETA6093 或 ETA9741 ETA9742 的 TYPE-C 的资料收集
ETA6093 或 ETA9741 ETA9742 的 TYPE-C 的资料收集 因为项目使用. 这个 IC 好玩,但是还是有一些需要注意的. 对我有用的信息. http://www.great-et ...
- WCF 快速入门
定义服务契约 构建HelloWCF应用的第一步是创建服务契约.契约式是表示消息应用外形的主要方式.对于外形,是指服务暴露的操作,使用的消息 schema和每个操作实现的消息交换模式(MEP).总之,契 ...
