predict.glm -> which class does it predict?
Jul 10, 2009; 10:46pm
predict.glm -> which class does it predict?
|
2 posts
|
Hi,
I have a question about logistic regression in R. Suppose I have a small list of proteins P1, P2, P3 that predict a model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of This works fine. T is a factored vector with levels cancer, noncancer. Now, I want to use predict.glm to predict a new data. predict(model, newdata=testsamples, type="response") (testsamples is The result is a vector of the probabilites for each sample in Is this fallowing expression Thank you, Peter ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
1330 posts
|
On Jul 10, 2009, at 9:46 AM, Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know > that I can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the > dataset of the Proteins). > > This works fine. T is a factored vector with levels cancer, > noncancer. Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples > is a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level > in T? To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? > > Thank you, > > Peter ... [show rest of quote]
As per the Details section of ?glm: A typical predictor has the form response ~ terms where response is So, given your description above, you are predicting If you want to predict "cancer", alter the factor levels thusly: T <- factor(T, levels = c("noncancer", "cancer")) By default, R will alpha sort the factor levels, so "cancer" would be Think of it in terms of using a 0,1 integer code for absence,presence, BTW, using 'T' as the name of the response vector is not a good habit: > T 'T' is shorthand for the built in R constant TRUE. R is generally HTH, Marc Schwartz ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
2360 posts
|
In reply to this post by Peter Schüffler-2
Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know that I > can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of > the Proteins). > > This works fine. T is a factored vector with levels cancer, noncancer. > Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples is > a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level in T? > To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? ... [show rest of quote]
It's the probability of the 2nd level of a factor response (termed I find it easiest to sort ut this kind of issue by experimentation in > x <- sample(c("A","B"),10,replace=TRUE) (notice that the relative frequency of B is 0.6) > glm(x~1,binomial) (OK, so it won't go without conversion to factor. This is a good thing.) > glm(factor(x)~1,binomial) Call: glm(formula = factor(x) ~ 1, family = binomial) Coefficients: Degrees of Freedom: 9 Total (i.e. Null); 9 Residual (The intercept is positive, corresponding to log odds for a probability > predict(glm(factor(x)~1,binomial)) As for why it's not the other way around, well, if it had been, then you -- ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
7686 posts
|
2009/7/10 Peter Dalgaard <[hidden email]>:
> Peter Schüffler wrote:
>> >> Hi, >> >> I have a question about logistic regression in R. >> >> Suppose I have a small list of proteins P1, P2, P3 that predict a >> two-class target T, say cancer/noncancer. Lets further say I know that I can >> build a simple logistic regression model in R >> >> model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of >> the Proteins). >> >> This works fine. T is a factored vector with levels cancer, noncancer. >> Proteins are numeric. >> >> Now, I want to use predict.glm to predict a new data. >> >> predict(model, newdata=testsamples, type="response") (testsamples is a >> small set of new samples). >> >> The result is a vector of the probabilites for each sample in testsamples. >> But probabilty WHAT for? To belong to the first level in T? To belong to >> second level in T? >> >> Is this fallowing expression >> factor(predict(model, newdata=testsamples, type="response") >= 0.5) >> TRUE, when the new sample is classified to Cancer or when it's classified >> to Noncancer? And why not the other way around? > > It's the probability of the 2nd level of a factor response (termed "success" > in the documentation, even when your modeling the probability of disease or > death...), just like when interpreting the logistic regression itself. > > I find it easiest to sort ut this kind of issue by experimentation in > simplified situations. E.g. > >> x <- sample(c("A","B"),10,replace=TRUE) >> x > [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" >> table(x) > x > A B > 4 6 > > (notice that the relative frequency of B is 0.6) > >> glm(x~1,binomial) > Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 > In addition: Warning message: > In model.matrix.default(mt, mf, contrasts) : > variable 'x' converted to a factor > > (OK, so it won't go without conversion to factor. This is a good thing.) > >> glm(factor(x)~1,binomial) > > Call: glm(formula = factor(x) ~ 1, family = binomial) > > Coefficients: > (Intercept) > 0.4055 > > Degrees of Freedom: 9 Total (i.e. Null); 9 Residual > Null Deviance: 13.46 > Residual Deviance: 13.46 AIC: 15.46 > > (The intercept is positive, corresponding to log odds for a probability > > 0.5 ; i.e., must be that "B": 0.4055==log(6/4)) > >> predict(glm(factor(x)~1,binomial)) > 1 2 3 4 5 6 7 8 > 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 > 0.4054651 > 9 10 > 0.4054651 0.4054651 >> predict(glm(factor(x)~1,binomial),type="response") > 1 2 3 4 5 6 7 8 9 10 > 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 > > As for why it's not the other way around, well, if it had been, then you > could have asked the same question.... > ... [show rest of quote]
Or more specifically: > resp <- factor(c("cancer", "noncancer", "noncancer", "noncancer")) and since noncancer occurs 75% of the time in the sample clearly ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
2360 posts
|
In reply to this post by Peter Dalgaard
> As for why it's not the other way around, well, if it had been, then you
> could have asked the same question.... ...and come to think about it, it is rather convenient that it meshes -- ______________________________________________ |
predict.glm -> which class does it predict?的更多相关文章
- CF451C Predict Outcome of the Game 水题
Codeforces Round #258 (Div. 2) Predict Outcome of the Game C. Predict Outcome of the Game time limit ...
- tflearn tensorflow LSTM predict sin function
from __future__ import division, print_function, absolute_import import tflearn import numpy as np i ...
- 如何在R语言中使用Logistic回归模型
在日常学习或工作中经常会使用线性回归模型对某一事物进行预测,例如预测房价.身高.GDP.学生成绩等,发现这些被预测的变量都属于连续型变量.然而有些情况下,被预测变量可能是二元变量,即成功或失败.流失或 ...
- 简单介绍一下R中的几种统计分布及常用模型
统计学上分布有很多,在R中基本都有描述.因能力有限,我们就挑选几个常用的.比较重要的简单介绍一下每种分布的定义,公式,以及在R中的展示. 统计分布每一种分布有四个函数:d――density(密度函数) ...
- Machine Learning for hackers读书笔记(六)正则化:文本回归
data<-'F:\\learning\\ML_for_Hackers\\ML_for_Hackers-master\\06-Regularization\\data\\' ranks < ...
- 统计学习导论:基于R应用——第五章习题
第五章习题 1. 我们主要用到下面三个公式: 根据上述公式,我们将式子化简为 对求导即可得到得到公式5-6. 2. (a) 1 - 1/n (b) 自助法是有有放回的,所以第二个的概率还是1 - 1/ ...
- 统计学习导论:基于R应用——第四章习题
第四章习题,部分题目未给出答案 1. 这个题比较简单,有高中生推导水平的应该不难. 2~3证明题,略 4. (a) 这个问题问我略困惑,答案怎么直接写出来了,难道不是10%么 (b) 这个答案是(0. ...
- R与数据分析旧笔记(⑨)广义线性回归模型
广义线性回归模型 广义线性回归模型 例题1 R.Norell实验 为研究高压电线对牲畜的影响,R.Norell研究小的电流对农场动物的影响.他在实验中,选择了7头,6种电击强度, 0,1,2,3,4, ...
- logistic回归和probit回归预测公司被ST的概率(应用)
1.适合阅读人群: 知道以下知识点:盒状图.假设检验.逻辑回归的理论.probit的理论.看过回归分析,了解AIC和BIC判别准则.能自己跑R语言程序 2.本文目的:用R语言演示一个相对完整的逻辑回归 ...
随机推荐
- Aho-Corasick算法
2018-03-15 10:25:02 在计算机科学中,Aho–Corasick算法是由Alfred V. Aho和Margaret J.Corasick 发明的字符串搜索算法,用于在输入的一串字符串 ...
- linux下升级npm以及node
npm升级 废话不多说,直接讲步骤.先从容易的开始,升级npm. npm这款包管理工具虽然一直被人们诟病,很多人都推荐使用yarn,但其使用人数还是不见减少,况且npm都是随node同时安装好的,一时 ...
- 牛客网——F小牛再战(博弈,不懂)
链接:https://www.nowcoder.net/acm/contest/75/F来源:牛客网 题目描述 共有N堆石子,已知每堆中石子的数量,两个人轮流取石子,每次只能选择N堆石子中的一堆取一定 ...
- 记录vue中一些有意思的坑
记录vue中一些有意思的坑 'message' handler took 401ms 在出现这个之前,我一直纠结于 是如何使用vue-router或者不使用它,通过类似的v-if来实现.结果却出现这个 ...
- Beta阶段第1周/共2周 Scrum立会报告+燃尽图 07
作业要求[https://edu.cnblogs.com/campus/nenu/2018fall/homework/2389] 版本控制:https://git.coding.net/liuyy08 ...
- Ethereum以太网搭建本地开放环境简明教程
引言: 区块链技术的风起云涌预示着一个去中心化时代的来临,ethereum技术栈是目前业界最为应用广泛的基于区块链技术的技术方案,本文将记录如何基于本地环境来搭建私有区块链的开发环境. 部署私有区块链 ...
- Codeforces 990B :Micro-World
B. Micro-World time limit per test 2 seconds memory limit per test 256 megabytes input standard inpu ...
- POI使用 (4.0) 常用改动
POI 升级到高版本后,原有的EXCLE导入导出工具类部分代码已不适用,目前只是对我自己写的工具类的过期代码进行更新,以后继续更新 若有问题请指出,再修改 1.数据类型 Cell.CELL_TYPE_ ...
- python requests 设置headers 和 post请求体x-www-form-urlencoded
1.application/json:是JSON格式提交的一种识别方式.在请求头里标示.2.application/x-www-form-urlencoded : 这是form表单提交的时候的表示方式 ...
- java编程之常见的排序算法
java常见的排序算法 第一种:插入排序 直接插入排序 1, 直接插入排序 (1)基本思想:在要排序的一组数中,假设前面(n-1)[n>=2] 个数已经是排 好顺序的,现在要把第n个数插到前面的 ...
