How to Read and Interpret a Regression Table

BY ZACH BOBBITTPOSTED ON MARCH 20, 2019

https://www.statology.org/read-interpret-regression-table/

In statistics, regression is a technique that can be used to analyze the relationship between predictor variables and a response variable.

When you use software (like R, SAS, SPSS, etc.) to perform a regression analysis, you will receive a regression table as output that summarize the results of the regression. It's important to know how to read this table so that you can understand the results of the regression analysis.

This tutorial walks through an example of a regression analysis and provides an in-depth explanation of how to read and interpret the output of a regression table.

A Regression Example

Suppose we have the following dataset that shows the total number of hours studied, total prep exams taken, and final exam score received for 12 different students:

Regression analysis data example

To analyze the relationship between hours studied and prep exams taken with the final exam score that a student receives, we run a multiple linear regression using hours studied and prep exams taken as the $\large predictor\ variables$ and final exam score as the $\large response\ variable$.

We receive the following output:

Examining the Fit of the Model

The first section shows several different numbers that measure the fit of the regression model, i.e. how well the regression model is able to "fit" the dataset.

Here is how to interpret each of the numbers in this section:

Multiple R

Multiple R is the square root of R-squared (see below)

This is the $\large correlation\ coefficient$. It measures the strength of the linear relationship between the predictor variables and the response variable. A multiple R of 1 indicates a perfect linear relationship while a multiple R of 0 indicates no linear relationship whatsoever.

In this example, the multiple R is 0.72855, which indicates a fairly strong linear relationship between the predictors study hours and prep exams and the response variable final exam score.

R-Squared

This is often written as $\large r^2$, and is also known as the $\largevcoefficient\ of\ determination$. It is the proportion of the variance in the response variable that can be explained by the predictor variable.

The value for R-squared can range from 0 to 1. A value of 0 indicates that the response variable cannot be explained by the predictor variable at all. A value of 1 indicates that the response variable can be $\large perfectly\ explained\ without\ error$ by the predictor variable.

In this example, the R-squared is 0.5307, which indicates that 53.07% of the variance in the final exam scores can be explained by the number of hours studied and the number of prep exams taken.

Adjusted R-Squared

This is a modified version of R-squared that has been adjusted for the number of predictors in the model. It is always lower than the R-squared. The adjusted R-squared can be useful for comparing the fit of different regression models to one another.

In this example, the Adjusted R-squared is 0.4265.

Standard Error of the Regression

The standard error of the regression is the average distance that the observed values fall from the regression line. In this example, the observed values fall an average of 7.3267 units from the regression line.

Observations

This is simply the number of observations in our dataset. In this example, the total observations is 12.

Testing the Overall Significance of the Regression Model

The next section shows the degrees of freedom, the sum of $\large squares$, $\large mean\ squares$, $\large F statistic$, and overall significance of the regression model.

Here is how to interpret each of the numbers in this section:

Regression degrees of freedom

This number is equal to: the number of regression coefficients – 1. In this example, we have an intercept term and two predictor variables, so we have three regression coefficients total, which means the regression degrees of freedom is 3 – 1 = 2.

Total degrees of freedom

This number is equal to: the number of observations – 1. In this example, we have 12 observations, so the total degrees of freedom is 12 – 1 = 11.

Residual degrees of freedom

This number is equal to: total df – regression df. In this example, the residual degrees of freedom is 11 – 2 = 9.

Mean Squares

The regression mean squares is calculated by regression SS / regression df. In this example, regression MS = 546.53308 / 2 = 273.2665.

The residual mean squares is calculated by residual SS / residual df. In this example, residual MS = 483.1335 / 9 = 53.68151.

F Statistic

The f statistic is calculated as regression MS / residual MS. This statistic indicates whether the regression model provides a better fit to the data than a model that contains no independent variables.

In essence, it tests if the regression model as a whole is useful. Generally if none of the predictor variables in the model are statistically significant, the overall F statistic is also not statistically significant.

In this example, the F statistic is 273.2665 / 53.68151 = 5.09.

Significance of F (P-value)

The last value in the table is the p-value associated with the F statistic. To see if the overall regression model is significant, you can compare the p-value to a significance level; common choices are .01, .05, and .10.

If the p-value is less than the significance level, there is sufficient evidence to conclude that the regression model fits the data better than the model with no predictor variables. This finding is good because it means that the predictor variables in the model actually improve the fit of the model.

In this example, the p-value is 0.033, which is less than the common significance level of 0.05. This indicates that the regression model as a whole is statistically significant, i.e. the model fits the data better than the model with no predictor variables.

Testing the Overall Significance of the Regression Model

The last section shows the coefficient estimates, the standard error of the estimates, the** t-stat**, p-values, and confidence intervals for each term in the regression model.

Here is how to interpret each of the numbers in this section:

Coefficients

The coefficients give us the numbers necessary to write the estimated regression equation:

$\large y_hat = b0 + b_1 \mul x_1 + b_2 \mul x_2$.

In this example, the estimated regression equation is:

$\large \text{final exam score} = 66.99 + 1.299(Study\ Hours) + 1.117(Prep\ Exams)$

Each individual coefficient is interpreted as the average increase in the response variable for each one unit increase in a given predictor variable, assuming that all other predictor variables are held constant. For example, for each additional hour studied, the average expected increase in final exam score is 1.299 points, assuming that the number of prep exams taken is held constant.

The intercept(截距, 此例是Z轴的) is interpreted as the expected average final exam score for a student who studies for zero hours and takes zero prep exams. In this example, a student is expected to score a 66.99 if they study for zero hours and take zero prep exams. Be careful when interpreting the intercept of a regression output, though, because it doesn't always make sense to do so.

For example, in some cases, the intercept may turn out to be a negative number, which often doesn't have an obvious interpretation. This doesn't mean the model is wrong, it simply means that the intercept by itself should not be interpreted to mean anything.

Standard Error, t-stats, and p-values

The standard error is a measure of the uncertainty around the estimate of the coefficient for each variable.

The t-stat is simply the coefficient divided by the standard error. For example, the t-stat for Study Hours is 1.299 / 0.417 = 3.117.

The next column shows the p-value associated with the t-stat. This number tells us if a given response variable is significant in the model. In this example, we see that the p-value for Study Hours is 0.012 and the p-value for Prep Exams is 0.304. This indicates that Study Hours is a significant predictor of final exam score, while Prep Exams is not.

$\large \text{Confidence Interval for Coefficient Estimates} $

The last two columns in the table provide the lower and upper bounds for $\large \text{a 95% confidence interval}$ for $\large \text{the coefficient estimates}$.

For example, the coefficient estimate for Study Hours is 1.299, but there is some uncertainty around this estimate. We can never know for sure if this is the exact coefficient. Thus, a 95% confidence interval gives us a range of likely values for the true coefficient.
In this case, the 95% confidence interval for Study Hours is (0.356, 2.24).

Notice that this confidence interval does not contain the number "0", which means we're quite confident that the true value for the coefficient of Study Hours is non-zero, i.e. a positive number.
By contrast, the 95% confidence interval for Prep Exams is (-1.201, 3.436).

Notice that this confidence interval does **contain the number "0", which means that the true value for the coefficient of Prep Exams could be zero, i.e. non-significant in predicting final exam scores.

Additional Resources

Understanding the Null Hypothesis for Linear Regression

Understanding the F-Test of Overall Significance in Regression

How to Report Regression Results

SciTech-Mathmatics-Probability+Statistics: How to Read and Interpret a $\large Regression\ Table$的更多相关文章

Probability&Statistics 概率论与数理统计(1)
基本概念样本空间: 随机试验E的所有可能结果组成的集合, 为E的样本空间, 记为S 随机事件: E的样本空间S的子集为E的随机事件, 简称事件, 由一个样本点组成的单点集, 称为基本事件对立事件/ ...
How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
不就ideas嘛，谁没有！
20160214 survey of current RDF triple storage systems survey of semantic web stack inference mechani ...
Advice for students of machine learning--转
原文地址:http://www.mimno.org/articles/ml-learn/ written by david mimno One of my students recently aske ...
Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
[book]awesome-machine-learning books
https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...
SCI&EI 英文PAPER投稿经验【转】
英文投稿的一点经验[转载] From: http://chl033.woku.com/article/2893317.html 1. 首先一定要注意杂志的发表范围, 超出范围的千万别投,要不就是浪费时 ...
QM4_Probability
Basic Concepts Probability concepts Terms Random variable A quantity whose possible values are uncer ...
Open Source Book For ML
The following is a list of free, open source books on machine learning, statistics, data-mining, etc ...
How do I learn mathematics for machine learning?
https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning How do I learn mathematics f ...

随机推荐

使用IDEA管理服务器Docker及远程仓库
目录配置连接Docker服务器及远程仓库连接服务器Docker 远程仓库(可选) IDEA管理确保docker服务器已经开启了远程守护进程访问.[1] 配置连接Docker服务器及远程仓库连接 ...
ChatGPT为何放弃WebSocket？揭秘EventSource的三大决胜优势
ChatGPT为何放弃WebSocket?揭秘EventSource的三大决胜优势感觉本篇对你有帮助可以关注一下我的微信公众号(深入浅出谈java),会不定期更新知识和面试资料.技巧!!! Chat ...
信息资源管理综合题之“ITSM（IT服务管理）和ITIL（基础架构标准库）内容”
一.在百度百科中,关于IT服务管理有如下描述:专家的研究和大量企业时间表明,在IT项目的生命周期中,大约80%的时间与IT项目运营维护有关,而该阶段的投资仅占整个IT投资的20%,形成了典型的&quo ...
c#开发完整的Socks5代理客户端与服务端（已完结）
本文我们介绍下如何在Windows系统上开发一个代理本机流量的客户端,并且对接我们之前开发的Socks5服务端,实现整个代理的一条龙.对于Socks5代理的服务端的开发可以详见之前的文章. 目录本机 ...
Vite 3 来了！新增功能 + 如何迁移
@charset "UTF-8"; .markdown-body { line-height: 1.75; font-weight: 400; font-size: 15px; o ...
c语言笔记（翁凯男神
哼,要记得好好学习去泡帅哥吖一.快速入门 %p 输出地址 #include <stdio.h> void f(int *p); int main(){ int i = 1; printf ...
selenium driver add_cookie正确姿势
需求 seo给了个开发小需求,查询搜索引擎站点后台的索引量需求分析难点在于怎么绕过登录技术选型使用selenium+firefox+geckodriver执行抓取技术难点解析获取cooki ...
tcpdump工具使用
一.简单介绍 tcpdump命令是一款sniffer工具,它可以打印所有经过网络接口的数据包的头信息,也可以使用-w选项将数据包保存到文件中,方便以后分析. 二.使用语法语法: tcpdump (选 ...
Linux系统安全配置相关
一.说明最近公司安全部门针对我们的系统从系统组件.系统配置.系统应用容器三个层面对系统进行了整体的扫描,针对系统配置这块,有许多安全配置项,这里在这里记录一下,有需要的直接按照介绍的配置进行相应的修 ...
ThreadLocal详解：线程私有变量的正确使用姿势
ThreadLocal详解:线程私有变量的正确使用姿势在多线程编程中,如何让每个线程都拥有自己独立的变量副本?ThreadLocal就像给每个线程分配了一个专属保险箱,解决了线程间数据冲突的问题.本 ...

SciTech-Mathmatics-Probability+Statistics: How to Read and Interpret a $\large Regression\ Table$

A Regression Example

Examining the Fit of the Model

Multiple R

R-Squared

Adjusted R-Squared

Standard Error of the Regression

Observations

Testing the Overall Significance of the Regression Model

Regression degrees of freedom

Total degrees of freedom

Residual degrees of freedom

Mean Squares

F Statistic

Significance of F (P-value)

Testing the Overall Significance of the Regression Model

Coefficients

Standard Error, t-stats, and p-values

$\large \text{Confidence Interval for Coefficient Estimates} $

Additional Resources

SciTech-Mathmatics-Probability+Statistics: How to Read and Interpret a $\large Regression\ Table$的更多相关文章

随机推荐

热门专题