predict.glm -> which class does it predict?
Jul 10, 2009; 10:46pm
predict.glm -> which class does it predict?
|
2 posts
|
Hi,
I have a question about logistic regression in R. Suppose I have a small list of proteins P1, P2, P3 that predict a model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of This works fine. T is a factored vector with levels cancer, noncancer. Now, I want to use predict.glm to predict a new data. predict(model, newdata=testsamples, type="response") (testsamples is The result is a vector of the probabilites for each sample in Is this fallowing expression Thank you, Peter ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
1330 posts
|
On Jul 10, 2009, at 9:46 AM, Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know > that I can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the > dataset of the Proteins). > > This works fine. T is a factored vector with levels cancer, > noncancer. Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples > is a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level > in T? To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? > > Thank you, > > Peter ... [show rest of quote]
As per the Details section of ?glm: A typical predictor has the form response ~ terms where response is So, given your description above, you are predicting If you want to predict "cancer", alter the factor levels thusly: T <- factor(T, levels = c("noncancer", "cancer")) By default, R will alpha sort the factor levels, so "cancer" would be Think of it in terms of using a 0,1 integer code for absence,presence, BTW, using 'T' as the name of the response vector is not a good habit: > T 'T' is shorthand for the built in R constant TRUE. R is generally HTH, Marc Schwartz ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
2360 posts
|
In reply to this post by Peter Schüffler-2
Peter Schüffler wrote:
> Hi,
> > I have a question about logistic regression in R. > > Suppose I have a small list of proteins P1, P2, P3 that predict a > two-class target T, say cancer/noncancer. Lets further say I know that I > can build a simple logistic regression model in R > > model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of > the Proteins). > > This works fine. T is a factored vector with levels cancer, noncancer. > Proteins are numeric. > > Now, I want to use predict.glm to predict a new data. > > predict(model, newdata=testsamples, type="response") (testsamples is > a small set of new samples). > > The result is a vector of the probabilites for each sample in > testsamples. But probabilty WHAT for? To belong to the first level in T? > To belong to second level in T? > > Is this fallowing expression > factor(predict(model, newdata=testsamples, type="response") >= 0.5) > TRUE, when the new sample is classified to Cancer or when it's > classified to Noncancer? And why not the other way around? ... [show rest of quote]
It's the probability of the 2nd level of a factor response (termed I find it easiest to sort ut this kind of issue by experimentation in > x <- sample(c("A","B"),10,replace=TRUE) (notice that the relative frequency of B is 0.6) > glm(x~1,binomial) (OK, so it won't go without conversion to factor. This is a good thing.) > glm(factor(x)~1,binomial) Call: glm(formula = factor(x) ~ 1, family = binomial) Coefficients: Degrees of Freedom: 9 Total (i.e. Null); 9 Residual (The intercept is positive, corresponding to log odds for a probability > predict(glm(factor(x)~1,binomial)) As for why it's not the other way around, well, if it had been, then you -- ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
7686 posts
|
2009/7/10 Peter Dalgaard <[hidden email]>:
> Peter Schüffler wrote:
>> >> Hi, >> >> I have a question about logistic regression in R. >> >> Suppose I have a small list of proteins P1, P2, P3 that predict a >> two-class target T, say cancer/noncancer. Lets further say I know that I can >> build a simple logistic regression model in R >> >> model <- glm(T ~ ., data=d.f(Y), family=binomial) (Y is the dataset of >> the Proteins). >> >> This works fine. T is a factored vector with levels cancer, noncancer. >> Proteins are numeric. >> >> Now, I want to use predict.glm to predict a new data. >> >> predict(model, newdata=testsamples, type="response") (testsamples is a >> small set of new samples). >> >> The result is a vector of the probabilites for each sample in testsamples. >> But probabilty WHAT for? To belong to the first level in T? To belong to >> second level in T? >> >> Is this fallowing expression >> factor(predict(model, newdata=testsamples, type="response") >= 0.5) >> TRUE, when the new sample is classified to Cancer or when it's classified >> to Noncancer? And why not the other way around? > > It's the probability of the 2nd level of a factor response (termed "success" > in the documentation, even when your modeling the probability of disease or > death...), just like when interpreting the logistic regression itself. > > I find it easiest to sort ut this kind of issue by experimentation in > simplified situations. E.g. > >> x <- sample(c("A","B"),10,replace=TRUE) >> x > [1] "B" "A" "B" "B" "A" "B" "B" "A" "B" "A" >> table(x) > x > A B > 4 6 > > (notice that the relative frequency of B is 0.6) > >> glm(x~1,binomial) > Error in eval(expr, envir, enclos) : y values must be 0 <= y <= 1 > In addition: Warning message: > In model.matrix.default(mt, mf, contrasts) : > variable 'x' converted to a factor > > (OK, so it won't go without conversion to factor. This is a good thing.) > >> glm(factor(x)~1,binomial) > > Call: glm(formula = factor(x) ~ 1, family = binomial) > > Coefficients: > (Intercept) > 0.4055 > > Degrees of Freedom: 9 Total (i.e. Null); 9 Residual > Null Deviance: 13.46 > Residual Deviance: 13.46 AIC: 15.46 > > (The intercept is positive, corresponding to log odds for a probability > > 0.5 ; i.e., must be that "B": 0.4055==log(6/4)) > >> predict(glm(factor(x)~1,binomial)) > 1 2 3 4 5 6 7 8 > 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 0.4054651 > 0.4054651 > 9 10 > 0.4054651 0.4054651 >> predict(glm(factor(x)~1,binomial),type="response") > 1 2 3 4 5 6 7 8 9 10 > 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 > > As for why it's not the other way around, well, if it had been, then you > could have asked the same question.... > ... [show rest of quote]
Or more specifically: > resp <- factor(c("cancer", "noncancer", "noncancer", "noncancer")) and since noncancer occurs 75% of the time in the sample clearly ______________________________________________ |
Re: predict.glm -> which class does it predict?
|
2360 posts
|
In reply to this post by Peter Dalgaard
> As for why it's not the other way around, well, if it had been, then you
> could have asked the same question.... ...and come to think about it, it is rather convenient that it meshes -- ______________________________________________ |
predict.glm -> which class does it predict?的更多相关文章
- CF451C Predict Outcome of the Game 水题
Codeforces Round #258 (Div. 2) Predict Outcome of the Game C. Predict Outcome of the Game time limit ...
- tflearn tensorflow LSTM predict sin function
from __future__ import division, print_function, absolute_import import tflearn import numpy as np i ...
- 如何在R语言中使用Logistic回归模型
在日常学习或工作中经常会使用线性回归模型对某一事物进行预测,例如预测房价.身高.GDP.学生成绩等,发现这些被预测的变量都属于连续型变量.然而有些情况下,被预测变量可能是二元变量,即成功或失败.流失或 ...
- 简单介绍一下R中的几种统计分布及常用模型
统计学上分布有很多,在R中基本都有描述.因能力有限,我们就挑选几个常用的.比较重要的简单介绍一下每种分布的定义,公式,以及在R中的展示. 统计分布每一种分布有四个函数:d――density(密度函数) ...
- Machine Learning for hackers读书笔记(六)正则化:文本回归
data<-'F:\\learning\\ML_for_Hackers\\ML_for_Hackers-master\\06-Regularization\\data\\' ranks < ...
- 统计学习导论:基于R应用——第五章习题
第五章习题 1. 我们主要用到下面三个公式: 根据上述公式,我们将式子化简为 对求导即可得到得到公式5-6. 2. (a) 1 - 1/n (b) 自助法是有有放回的,所以第二个的概率还是1 - 1/ ...
- 统计学习导论:基于R应用——第四章习题
第四章习题,部分题目未给出答案 1. 这个题比较简单,有高中生推导水平的应该不难. 2~3证明题,略 4. (a) 这个问题问我略困惑,答案怎么直接写出来了,难道不是10%么 (b) 这个答案是(0. ...
- R与数据分析旧笔记(⑨)广义线性回归模型
广义线性回归模型 广义线性回归模型 例题1 R.Norell实验 为研究高压电线对牲畜的影响,R.Norell研究小的电流对农场动物的影响.他在实验中,选择了7头,6种电击强度, 0,1,2,3,4, ...
- logistic回归和probit回归预测公司被ST的概率(应用)
1.适合阅读人群: 知道以下知识点:盒状图.假设检验.逻辑回归的理论.probit的理论.看过回归分析,了解AIC和BIC判别准则.能自己跑R语言程序 2.本文目的:用R语言演示一个相对完整的逻辑回归 ...
随机推荐
- windows java 环境变量配置
第一步 找到系统设置环境变量的位置(windows 10): 控制面板\系统和安全\系统 点击 ‘高级系统设置’ 就可以看到 “环境变量” 了 第二步 设置3个路径 1.path (配置JD ...
- RabbitMQ 消息传递的可靠性
生产者保证消息可靠投递 消费者保证消息可靠消费 RabbitMQ持久化 参考:https://blog.csdn.net/RobertoHuang/article/details/79605185
- 2243: [SDOI2011]染色 树链剖分+线段树染色
给定一棵有n个节点的无根树和m个操作,操作有2类: 1.将节点a到节点b路径上所有点都染成颜色c: 2.询问节点a到节点b路径上的颜色段数量(连续相同颜色被认为是同一段), 如“112221”由3段组 ...
- VGG16提取图像特征 (torch7)
VGG16提取图像特征 (torch7) VGG16 loadcaffe torch7 下载pretrained model,保存到当前目录下 th> caffemodel_url = 'htt ...
- 003PHP文件处理——目录操作:rename scandir
<?php //目录操作:rename scandir /** * 修改目录名字: * rename('旧名字','新名字') 改变文件夹或者文件的名称 */ //var_dump(rename ...
- vue.js利用vue.router创建前端路由
node.js方式: 利用node.js安装vue-router模块 cnpm install vue-router 安装完成后我们引入这个模板! 下载vue-router利用script引入方式: ...
- Git分支管理及合并
Git分支管理 建立分支 git branch [name] 切换到分支 git checkout [name] 查看有哪些分支 git branch 比较分支 git diff [b ...
- 使用java.net.URLConnection发送http请求
首先,这个需要一点HTTP基础,可以先看个书了解下,我看的<http权威指南>的前4章,后面道行不够看不下去. 然后我们的是java.net的接口: 几个类的API: package co ...
- jQuery插件开发——全屏切换插件
这个插件包含三个部分:HTML结构.CSS代码和JS代码. HTML结构是固定的,结构如下: <!--全屏滚动--> <div class="fullpage-contai ...
- nodejs通过buffer传递数据到前端,前端通过arraybuffer接收数据
以后端传送threejs中的点阵数组为例: 后端: let buffer = Buffer.alloc((points.length + 4) * 4) //points.length + 4:预留前 ...
