Why do we make statistics so hard for our students?

(Warning: long and slightly wonkish)

If you’re like me, you’re continually frustrated by the fact that undergraduate students struggle to understand statistics. Actually, that’s putting it mildly: a large fraction of undergraduates simplyrefuse to understand statistics; mention a requirement for statistical data analysis in your course and you’ll get eye-rolling, groans, or (if it’s early enough in the semester) a rash of course-dropping.

This bothers me, because we can’t do inference in science without statistics*. Why are students so unreceptive to something so important? In unguarded moments, I’ve blamed it on the students themselves for having decided, a priori and in a self-fulfilling prophecy, that statistics is math, and they can’t do math. I’ve blamed it on high-school math teachers for making math dull. I’ve blamed it on high-school guidance counselors for telling students that if they don’t like math, they should become biology majors. I’ve blamed it on parents for allowing their kids to dislike math. I’ve even blamed it on the boogie**.

All these parties (except the boogie) are guilty. But I’ve come to understand that my list left out the most guilty party of all: us. By “us” I mean university faculty members who teach statistics – whether they’re in Departments of Mathematics, Departments of Statistics, or (gasp) Departments of Biology. We make statistics needlessly difficult for our students, and I don’t understand why.

The problem is captured in the image above – the formulas needed to calculate Welch’s t-test. They’re arithmetically a bit complicated, and they’re used in one particular situation: comparing two means when sample sizes and variances are unequal. If you want to compare three means, you need a different set of formulas; if you want to test for a non-zero slope, you need another set again; if you want to compare success rates in two binary trials, another set still; and so on. And each set of formulas works only given the correctness of its own particular set of assumptions about the data.

Given this, can we blame students for thinking statistics is complicated? No, we can’t; but we can blame ourselves for letting them think that it is. They think so because we consistently underemphasize the single most important thing about statistics: that this complication is an illusion. In fact, every significance test works exactly the same way.

Every significance test works exactly the same way. We should teach this first, teach it often, and teach it loudly; but we don’t. Instead, we make a huge mistake: we whiz by it and begin teaching test after test, bombarding students with derivations of test statistics and distributions and paying more attention to differences among tests than to their crucial, underlying identity. No wonder students resent statistics.

What do I mean by “every significance test works exactly the same way”? All (NHST) statistical tests respond to one problem with two simple steps.

The problem:

We see apparent pattern, but we aren’t sure if we should believe it’s real, because our data are noisy.

The two steps:

Step 1. Measure the strength of pattern in our data.
Step 2. Ask ourselves, is this pattern strong enough to be believed?

Teaching the problem motivates the use of statistics in the first place (many math-taught courses, and nearly all biology-taught ones, do a good job of this). Teaching the two steps gives students the tools to test any hypothesis – understanding that it’s just a matter of choosing the right arithmetic for their particular data. This is where we seem to fall down.

Step 1, of course, is the test statistic. Our job is to find (or invent) a number that measures the strength of any given pattern. It’s not surprising that the details of computing such a number depend on the pattern we want to measure (difference in two means, slope of a line, whatever). But those details always involve the three things that we intuitively understand to be part of a pattern’s “strength” (illustrated below): the raw size of the apparent effect (in Welch’s t, the difference in the two sample means); the amount of noise in the data (in Welch’s t, the two sample standard deviations), and the amount of data in hand (in Welch’s t, the two sample sizes). You can see by inspection that these behave in the Welch’s formulas just the way they should: t gets bigger if the means are farther apart, the samples are less noisy, and/or the sample sizes are larger. All the rest is uninteresting arithmetical detail.

Step 2 is the P-value. We have to obtain a P-value corresponding to our test statistic, which means knowing whether assumptions are met (so we can use a lookup table) or not (so we should use randomization or switch to a different test***). Every test uses a different table – but all the tables work the same way, so the differences are again just arithmetic. Interpreting the P-value once we have it is a snap, because it doesn’t matter what arithmetic we did along the way: the P-value for any test is the probability of a pattern as strong as ours (or stronger), in the absence of any true underlying effect. If this is low, we’d rather believe that our pattern arose from real biology than believe it arose from a staggering coincidence (Deborah Mayo explains the philosophy behind this here, or see her excellent blog).

Of course, there are lots of details in the differences among tests. These matter, but they matter in a second-order way: until we understand the underlying identity of how every test works, there’s no point worrying about the differences. And even then, the differences are not things we need to remember; they’re things we need to know to look up when needed. That’s why if I know how to do one statistical test – any one statistical test – I know how to do all of them.

Does this mean I’m advocating teaching “cookbook” statistics? Yes, but only if we use the metaphor carefully and not pejoratively. A cookbook is of little use to someone who knows nothing at all about cooking; but if you know a handful of basic principles, a cookbook guides you through thousands of cooking situations, for different ingredients and different goals. All cooks own cookbooks; few memorize them.

So if we’re teaching statistics all wrong, here’s how to do it right: organize everything around the underlying identity. Start with it, spend lots of time on it, and illustrate it with one test (any test) worked through with detailed attention not to the computations, but to how that test takes us through the two steps. Don’t try to cover the “8 tests every undergraduate should know”; there’s no such list. Offer a statistical problem: some real data and a pattern, and ask the students how they might design a test to address that problem. There won’t be one right way, and even if there was, it would be less important than the exercise of thinking through the steps of the underlying identity.

Finally: why do instructors make statistics about the differences, not the underlying identity? I said I don’t know, but I can speculate.

When statistics is taught by mathematicians, I can see the temptation. In mathematical terms, the differences between tests are the interesting part. This is where mathematicians show their chops, and it’s where they do the difficult and important job of inventing new recipes to cook reliable results from new ingredients in new situations. Users of statistics, though, would be happy to stipulate that mathematicians have been clever, and that we’re all grateful to them, so we can get onto the job of doing the statistics we need to do.

When statistics is taught by biologists, the mystery is deeper. I think (I hope!) those of us who teach statistics all understand the underlying identity of all tests, but that doesn’t seem to stop us from the parade-of-tests approach. One hypothesis: we may be responding to pressure (perceived or real) from Mathematics departments, who can disapprove of statistics being taught outside their units and are quick to claim insufficient mathematical rigour when it is. Focus on lots of mathematical detail gives a veneer of apparent rigour. I’m not sure that my hypothesis is correct, but I’ve certainly been part of discussions with Math departments that were consistent with it.

Whatever the reasons, we’re doing real damage to our students when we make statistics complicated. It isn’t. Remember, every statistical test works exactly the same way. Teach a student that today.

Note: for a rather different take on the cookbook-stats metaphor, see Joan Strassmann’s interesting post here. I think I agree with her only in part, so you should read her piece too.

Another related piece by Christie Bahlai is here: “Hey, let’s all just relax about statistics” – but with a broader message about NHST across fields.

Finally, here’s the story of two ecologists who learned to love statistics– and it’s lots of fun.

*In this post I’m going to discuss frequentist inferential statistics, or traditional “null-hypothesis significance testing”. I’ll leave aside debates about whether Bayesian methods are superior and whether P-values get misapplied (see my defence of the P-value). I’m going to refrain from snorting derisively at claims that we don’t need inferential statistics at all.

**OK, not really, but slipping that in there lets me link to this. Similarly I’m tempted to blame it on the rain, to blame it on Cain, to blame it on the Bossa Nova, and to blame it on Rio. OK, I’ll stop now; but if you’ve got one I missed, why not drop a link in the Replies?

***I’d include transforming the data as “switch to a different test”, but if you’d rather draw a distinction there, that’s fine.

Why do we make statistics so hard for our students?的更多相关文章

ABBA BABA statistics
The ABBA BABA statistics are used to detect and quantify an excess of shared derived alleles, which ...
SQL Server 的 Statistics 簡介
當你要清空「資料表(table)」,或倒入大量「資料(data;record)」,或公司「資料庫(database)」改用新版本要資料大搬家…等情形,不只是要重建「索引(index)」,還應要重建或更 ...
SP2-0618: 无法找到会话标识符。启用检查 PLUSTRACE 角色 SP2-0611: 启用 STATISTICS 报告时出错
援引: SP2-0618: 无法找到会话标识符.启用检查 PLUSTRACE 角色 SP2-0611: 启用 STATISTICS 报告时出错问题描述及解决方法: SQL*Plus: Release ...
Spark MLlib 之 Basic Statistics
Spark MLlib提供了一些基本的统计学的算法,下面主要说明一下: 1.Summary statistics 对于RDD[Vector]类型,Spark MLlib提供了colStats的统计方法 ...
SQL优化 CREATE STATISTICS
CREATE STATISTICS 语法: https://msdn.microsoft.com/zh-cn/library/ms188038.aspx STATISTICS优化中的使用案例: htt ...
[转] 利用SET STATISTICS IO和SET STATISTICS TIME 优化SQL Server查询性能
首先需要说明的是这篇文章的内容并不是如何调节SQL Server查询性能的(有关这方面的内容能写一本书),而是如何在SQL Server查询性能的调节中利用SET STATISTICS IO和SET ...
性能调优：理解Set Statistics IO输出
性能调优是DBA的重要工作之一.很多人会带着各种性能上的问题来问我们.我们需要通过SQL Server知识来处理这些问题.经常被问到的一个问题是:早上这个存储过程运行时间还是可以的,但到了晚上就很慢很 ...
Stanford机器学习笔记-3.Bayesian statistics and Regularization
3. Bayesian statistics and Regularization Content 3. Bayesian statistics and Regularization. 3.1 Und ...
SQL Server读懂语句运行的统计信息 SET STATISTICS TIME IO PROFILE ON
对于语句的运行,除了执行计划本身,还有一些其他因素要考虑,例如语句的编译时间.执行时间.做了多少次磁盘读等. 如果DBA能够把问题语句单独测试运行,可以在运行前打开下面这三个开关,收集语句运行的统计信 ...

随机推荐

Redux系列02：一个炒鸡简单的react+redux例子
前言在<Redux系列01:从一个简单例子了解action.store.reducer>里面,我们已经对redux的核心概念做了必要的讲解.接下来,同样是通过一个简单的例子,来讲解如何将 ...
Tomcat源码学习（1）
Tomcat源码学习(1) IntelliJ IDEA 17.3.3 导入 Tomcat 9.0.6源码下载源码 tomcat_9.0.6 启动 IDEA. 点击 Open,选择刚才下载的文件解压后 ...
B1015 德才论（25 分)
19/25 #include<bits/stdc++.h> using namespace std; /* 1.de>=H && cai>=H 2.de> ...
【SE】Week2 : 个人博客作业
1. 是否需要有代码规范对于是否需要有代码规范,请考虑下列论点并反驳/支持: Statement1 : 这些规范都是官僚制度下产生的浪费大家的编程时间.影响人们开发效率, 浪费时间的东西. 这样的 ...
团队作业之旅游行业APP分析
随着经济的发展,不论是在工作中的男女老少,还是在校园中的童鞋,都喜欢在假期来一场说走就走的旅行,来缓解生活中的各种压力.当然,在国家面临经济转型的情况下,更多的将工业,农业转向服务型的旅游业,各个省市 ...
从零开始学Kotlin-扩展函数(10)
从零开始学Kotlin基础篇系列文章什么是扩展函数扩展函数数是指在一个类上增加一种新的行为,我们甚至没有这个类代码的访问权限: Kotlin 可以对一个类的属性和方法进行扩展,且不需要继承或使用 ...
vmEsxi一些使用
打开esxi的shell:在故障检查选项中回车,打开shell ALT+F1进入esxi的shell ALT+F2返回精简置备--用多少占多少,上限为设置的磁盘大小虚机扩容:1.原本的扩容2.添 ...
『编程题全队』Alpha 阶段冲刺博客Day8
1.每日站立式会议 1.会议照片 2.昨天已完成的工作统计孙志威: 1.修复了看板任务框拖拽时候位置不够精确的问题 2.向个人界面下添加了工具栏 3.个人界面下添加了任务框测试孙慧君: 1.个人任 ...
Win2019 preview 版本的安装过程
1. 加入 windows insider 协议登录自己的账号同意 insder 协议. 然后 https://www.microsoft.com/en-us/software-download/ ...
从苦逼到牛逼，详解Linux运维工程师的打怪升级之路
做运维也快四年多了,就像游戏打怪升级,升级后知识体系和运维体系也相对变化挺大,学习了很多新的知识点. 运维工程师是从一个呆逼进化为苦逼再成长为牛逼的过程,前提在于你要能忍能干能拼,还要具有敏锐的嗅觉感 ...

Why do we make statistics so hard for our students?

Why do we make statistics so hard for our students?

Why do we make statistics so hard for our students?的更多相关文章

随机推荐

热门专题