WHAT IS THE DIFFERENCE BETWEEN CATEGORICAL, ORDINAL AND INTERVAL VARIABLES?

In talking about variables, sometimes you hear variables being described as categorical (or sometimesnominal), or ordinal, or interval.  Below we will define these terms and explain why they are important.

Categorical

A categorical variable (sometimes called a nominal variable) is one that has two or more categories, but there is no intrinsic ordering to the categories.  For example, gender is a categorical variable having two categories (male and female) and there is no intrinsic ordering to the categories.  Hair color is also a categorical variable having a number of categories (blonde, brown, brunette, red, etc.) and again, there is no agreed way to order these from highest to lowest.  A purely categorical variable is one that simply allows you to assign categories but you cannot clearly order the variables.  If the variable has a clear ordering, then that variable would be an ordinal variable, as described below.

Ordinal

An ordinal variable is similar to a categorical variable.  The difference between the two is that there is a clear ordering of the variables.  For example, suppose you have a variable, economic status, with three categories (low, medium and high).  In addition to being able to classify people into these three categories, you can order the categories as low, medium and high. Now consider a variable like educational experience (with values such as elementary school graduate, high school graduate, some college and college graduate). These also can be ordered as elementary school, high school, some college, and college graduate.  Even though we can order these from lowest to highest, the spacing between the values may not be the same across the levels of the variables. Say we assign scores 1, 2, 3 and 4 to these four levels of educational experience and we compare the difference in education between categories one and two with the difference in educational experience between categories two and three, or the difference between categories three and four. The difference between categories one and two (elementary and high school) is probably much bigger than the difference between categories two and three (high school and some college).  In this example, we can order the people in level of educational experience but the size of the difference between categories is inconsistent (because the spacing between categories one and two is bigger than categories two and three).  If these categories were equally spaced, then the variable would be an interval variable.

Interval

An interval variable is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced.  For example, suppose you have a variable such as annual income that is measured in dollars, and we have three people who make $10,000, $15,000 and $20,000. The second person makes $5,000 more than the first person and $5,000 less than the third person, and the size of these intervals is the same.  If there were two other people who make $90,000 and $95,000, the size of that interval between these two people is also the same ($5,000).

Why does it matter whether a variable is categorical, ordinal or interval?

Statistical computations and analyses assume that the variables have a specific levels of measurement.  For example, it would not make sense to compute an average hair color.  An average of a categorical variable does not make much sense because there is no intrinsic ordering of the levels of the categories.  Moreover, if you tried to compute the average of educational experience as defined in the ordinal section above, you would also obtain a nonsensical result.  Because the spacing between the four levels of educational experience is very uneven, the meaning of this average would be very questionable.  In short, an average requires a variable to be interval. Sometimes you have variables that are “in between” ordinal and interval, for example, a five-point likert scale with values “strongly agree”, “agree”, “neutral”, “disagree” and “strongly disagree”.  If we cannot be sure that the intervals between each of these five values are the same, then we would not be able to say that this is an interval variable, but we would say that it is an ordinal variable.  However, in order to be able to use statistics that assume the variable is interval, we will assume that the intervals are equally spaced.

Does it matter if my dependent variable is normally distributed?

When you are doing a t-test or ANOVA, the assumption is that the distribution of the sample means are normally distributed.  One way to guarantee this is for the distribution of the individual observations from the sample to be normal.  However, even if the distribution of the individual observations is not normal, the distribution of the sample means will be normally distributed if your sample size is about 30 or larger.  This is due to the “central limit theorem” that shows that even when a population is non-normally distributed, the distribution of the “sample means” will be normally distributed when the sample size is 30 or more, for example see Central limit theorem demonstration .

If you are doing a regression analysis, then the assumption is that your residuals are normally distributed.  One way to make it very likely to have normal residuals is to have a dependent variable that is normally distributed and predictors that are all normally distributed, however this is not necessary for your residuals to be normally distributed.  You can see

  • Regression with Stata: Chapter 2 – Regression Diagnostics
  • Regression with SAS: Chapter 2 -Regression Diagnostics
  • Introduction to Regression with SPSS: Lesson 2 – Regression Diagnostics

CATEGORICAL, ORDINAL AND INTERVAL VARIABLES的更多相关文章

  1. 【转】The difference between categorical(Nominal ), ordinal and interval variables

    What is the difference between categorical, ordinal and interval variables? In talking about variabl ...

  2. 关于使用sklearn进行数据预处理 —— 归一化/标准化/正则化

    一.标准化(Z-Score),或者去除均值和方差缩放 公式为:(X-mean)/std  计算时对每个属性/每列分别进行. 将数据按期属性(按列进行)减去其均值,并处以其方差.得到的结果是,对于每个属 ...

  3. SAS-决策树模型

    决策树是日常建模中使用最普遍的模型之一,在SAS中,除了可以通过EM模块建立决策树模型外,还可以通过SAS代码实现.决策树模型在SAS系统中对应的过程为Proc split或Proc hpsplit, ...

  4. Parametric Statistics

    1.What are “Parametric Statistics”? 统计中的参数指的是总体的一个方面,而不是统计中的一个方面,后者指的是样本的一个方面.例如,总体均值是一个参数,而样本均值是一个统 ...

  5. Chapter 02—Creating a dataset(Part1)

    一. 数据集 1. 在R语言中,进行数据分析的第一步是创建一个包含待研究数据并且符合要求的数据集. · 选择装数据的数据结构 · 把数据装入数据结构中 2. 理解数据集 (1)数据集通常是矩形的数据列 ...

  6. SAS数据挖掘实战篇【六】

    SAS数据挖掘实战篇[六] 6.3  决策树 决策树主要用来描述将数据划分为不同组的规则.第一条规则首先将整个数据集划分为不同大小的 子集,然后将另外的规则应用在子数据集中,数据集不同相应的规则也不同 ...

  7. MatterTrack Route Of Network Traffic :: Matter

    Python 1.1 基础 while语句 字符串边缘填充 列出文件夹中的指定文件类型 All Combinations For A List Of Objects Apply Operations ...

  8. 精通D3.js学习笔记(2)比例尺和坐标

    1.线性比例尺 d3.scale.linear()   创建一个线性比例尺           .domain([0,500]) 定义域           .range([0,1000]) 值域 l ...

  9. 机器学习算法基础(Python和R语言实现)

    https://www.analyticsvidhya.com/blog/2015/08/common-machine-learning-algorithms/?spm=5176.100239.blo ...

随机推荐

  1. 【AtCoder】ARC098题解

    C - Attention 枚举,计算前缀和即可 代码 #include <bits/stdc++.h> #define fi first #define se second #defin ...

  2. 【LOJ】#2098. 「CQOI2015」多项式

    题解 令x = x - t代换一下会发现 \(\sum_{i = 0}^{n}a_i (x + t)^i = \sum_{i = 0}^{n} b_{i} x^{i}\) 剩下的就需要写高精度爆算了- ...

  3. Java异常类层次结构图

    1. 分类图镇楼: 2.运行时异常与非运行时异常区别: Java 提供了两类主要的异常 :runtime exception 和 checked exception. 2.1 checked exce ...

  4. c# HttpWebRequest 和HttpWebResponse 登录网站或论坛(校内网登陆)

    这是登录校内网的代码呵呵自己注册一个试试吧我的账号和密码就不给了 不过可以加我为好友      冯洪春  貌似校内上就我一个 Form1.cs代码: using System;using System ...

  5. 使用JAXB实现Bean与Xml相互转换

    最近几天,我自己负责的应用这边引入了一个新的合作方,主要是我这边调用他们的接口,但是有个很坑的地方,他们传参居然不支持json格式,并且只支持xml格式进行交互,于是自己写了一个工具类去支持bean与 ...

  6. 【WIN10】WIN2D——圖像處理

    源碼下載:http://yunpan.cn/c3iNuHFFAcr8h  访问密码 8e48 還是先來看下截圖: 實現了幾個效果:放大.縮小.旋轉.左右翻轉.上下翻轉,亮度變化.灰度圖.對比度.高斯模 ...

  7. 【bzoj3451】Tyvj1953 Normal 期望+树的点分治+FFT

    题目描述 给你一棵 $n$ 个点的树,对这棵树进行随机点分治,每次随机一个点作为分治中心.定义消耗时间为每层分治的子树大小之和,求消耗时间的期望. 输入 第一行一个整数n,表示树的大小接下来n-1行每 ...

  8. 解决jQueryUi AutoComplete在某些浏览器下无法出现候选项问题

    在某些浏览器(如火狐),在使用AutoComplete进行绑定的时候,无法出现与关键字相似的候选项.其原因这里有描述: 解决方法可以采用下面方式: $('#bindInputId).bind(&quo ...

  9. FireDAC 下的 Sqlite [12] - 备忘录(草草结束这个话题了)

    该话题的继续延伸主要就是 SQL 的语法了, 草草收场的原因是现在的脑筋已经进入了 IntraWeb 的世界. 相关备忘会随时补充在下面: //连接多个数据库的参考代码: FDConnection1. ...

  10. 20款最好的免费 Bootstrap 后台管理和前端模板

    Admin Bootstrap Templates Free Download 1. SB Admin 2 Preview | Details & Download 2. Admin Lite ...