Source: Sigma Zone, by Philip Mayfield

The Binomial Distribution is commonly used in statistics in a variety of applications. Binomial data and statistics are presented to us daily. For example, in the election of political officials we may be asked to choose between two candidates. Polling organizations often take samples of “likely voters” in an attempt to predict who will be elected before the actual election occurs.

To illustrate this, let’s assume that two candidates are running in an election for Governor of California. This fictitious election pits Mr. Gubinator vs. Mr. Ventura. We would like to know who is winning the race, and therefore we conduct a poll of likely voters in California. If the poll gives the voters a choice between the two candidates, then the results can be reasonably modeled with the Binomial Distribution. In our poll of 50 likely voters, 58% indicate they intend to vote for Mr. Gubinator. Does this mean that 58% of all voters intend to vote for Mr. Gubinator? Probably not. If we were to repeat this poll several times in the same day (using a different group of 50 each time) we would find that the percentage that intends to vote for Mr. Gubinator would change with each poll.

The poll and most binomial samples come with some error. When polls are presented in the media, on the bottom of the screen or page you often see a small note with wording similar to “Margin of error +/- 5%”. This +/-5% indicates that if the poll was repeated multiple times, the result would likely fall in the range of 58% +/- 5%, or 53% to 63%. The margin of error is also called the confidence interval and is used to describe how much uncertainty we have in the sample estimate. There are several ways to estimate the Binomial Confidence Interval (CI); in this article we will focus on the Normal Approximation Method and the Clopper-Pearson Method.

Normal Approximation Method of the Binomial Confidence Interval

The equation for the Normal Approximation for the Binomial CI is shown below.

where:

p = proportion of interest

n = sample size

α = desired confidence

z1- α/2 = “z value” for desired level of confidence

z1- α/2 = 1.96 for 95% confidence

z1- α/2 = 2.57 for 99% confidence

z1- α/2 = 3 for 99.73% confidence

Using our previous example, if a poll of 50 likely voters resulted in 29 expressing their desire to vote for Mr. Gubinator, the resulting 95% CI would be calculated as follows.

Thus, we would be 95% confident that the proportion of the target population (all voters in California) who intend to vote for Mr. Gubernator falls between 44% and 72%.

While this method is very easy to teach and understand, you may have noticed that z1- α/2 is derived from the Normal Distribution and not the Binomial Distribution. The use of the z value from the Normal Distribution is where the method earns its moniker “Normal Approximation”. While the use of the Normal Distribution seems odd at first, it is supported by the central limit theorem and with sufficiently large n, the Normal Distribution is a good estimate of the Binomial Distribution.

However, there are times when the Normal Distribution is not a good estimator of the Binomial. When p is very small or very large, the Normal Approximation starts to suffer from increased inaccuracy. Specifically, when np > 5 or n(1-p)>5 the Normal Approximation method should not be used [1]. Additionally, if you try to calculate any CI with p=0 or p=1, you will find that it is not possible.

Normal Approximation Summary

  • Advantages

    Easy to teach and understand

    Easy to calculate by hand
  • Disadvantages

    Accuracy suffers when np < 5 or n(1-p)<5

    Calculation not possible when p =0 or p=1

Exact Confidence Interval

The deficiencies in the Normal Approximation were addressed by Clopper and Pearson when they developed the Clopper-Pearson method which is commonly referred to as the “Exact Confidence Interval” [3]. Instead of using a Normal Approximation, the Exact CI inverts two single-tailed Binomial test at the desired alpha. Specifically, the Exact CI is range from plb to pub that satisfies the following conditions [2].

The population proportion falls in the range

plb to pub where:

plb is the confidence interval lower bound

pub is the confidence interval upper bound

n is the number of trials

k is the number of successes in n trials

α is the percent chance of making a Type I error, 1-α is the confidence

While the Normal Approximation method is easy to teach and understand, I would rather deliver a lesson on quantum mechanics than attempt to explain the equations behind the Exact Confidence Interval. While the population proportion falls in the range plb to pub, the calculation of these values is non-trivial and for most requires the use of a computer. You may note that the equations above are based upon the Binomial Cumulative Distribution Function (cdf). The Beta Distribution can be used to calculate the Binomial cdf, and so a more common way to represent the Binomial Exact CI is using the equations below.

The F Distribution can also be used to estimate the Binomial cdf, and so alternative formulas use the F in lieu of the Beta Distribution.

Exact Binomial Confidence Interval Summary

  • Advantages

    Accurate when np > 5 or n(1-p)>5

    Calculation is possible when p =0 or p=1
  • Disadvantages

    Formulas are complex and require computers to calculate

Which to use

The Normal Approximation method serves as a simple way to introduce the idea of the confidence interval. The formula is easy to understand and calculate, which allows the student to easily grasp the concept. However, the inaccuracies with very small p or the inability handle p=0 is a somewhat severe limitation in business applications. For example, if a test of 10 cell phones reveals zero defects, what is the confidence interval of the defective phones in the total population? This question is commonly posed and yet the Normal Approximation cannot be used to find an answer. As personal computers with ample calculation power have become prevalent, there is a trend towards using the Exact CI in lieu the Normal Approximation. At SigmaZone.com, we believe that the best method is to teach the concept using the Normal Approximation method and then tell the students that it is just an approximation. We then point out that the software calculates the exact confidence interval which can handle p=0 or p=1.

Final Notes

The term “Exact Confidence Interval” is a bit of a misnomer. Neyman noted [4] that “exact probability statements are impossible in the case of the Binomial Distribution”. This stems from the fact that k, the number of successes in n trials, must be expressed as an integer. Various methods have been suggested as improvements to the Exact CI, including the Wilson Method and the Modified Wilson Method.

Finally, to avoid a flood of emails I should note that the binomial distribution is a discrete probability distribution used to model the number of successes in n independent binomial experiments that have a constant probability of success p. The election example may not be applicable in that during the poll someone might indicate that they neither want to vote for Mr. Gubinator or Mr. Ventura or put another way, they have no preference. If this is the case, there are now three options, Mr. Gubinator, Mr. Ventura, and No Preference and the experiment is no longer binomial as there are three choices instead of two.

References

[1] Brown, L. D., Cai, T. T., and DasGupta, A. Interval Estimation for a Binomial Proportion. Statistical Science 16: 101-117, 2001.

[2] Gnedenko, B.V., Ushakov I.A., Pavlov I.V.. Statistical Reliability Engineering. Wiley, John & Sons, April 1999.

[3] Clopper, C. and Pearson, S. The use of confidence or fiducial limits illustrated in the case of the Binomial. Biometrika 26: 404-413, 1934.

[4] Neyman, J. On the problem of confidence intervals. The Annals of Mathematical Statistics, 6, 116, 1935.

Understanding Binomial Confidence Intervals 二项分布的置信区间的更多相关文章

  1. confidence intervals and precision|The One-Mean z-Interval Procedure|When to Use the One-Mean z-Interval Procedure

    Confidence Intervals for One Population Mean When σ Is Known Obtaining Confidence Intervals for a Po ...

  2. Permutation test: p, CI, CI of P 置换检验相关统计量的计算

    For research purpose, I've read a lot materials on permutation test issue. Here is a summary. Should ...

  3. 【概率论】5-2:伯努利和二项分布(The Bernoulli and Binomial Distributions)

    title: [概率论]5-2:伯努利和二项分布(The Bernoulli and Binomial Distributions) categories: - Mathematic - Probab ...

  4. frequentism-and-bayesianism-chs-ii

    frequentism-and-bayesianism-chs-ii 频率主义 vs 贝叶斯主义 II:当结果不同时   这个notebook出自Pythonic Perambulations的博文  ...

  5. R语言错误的提示(中英文翻译)

    # Chinese translations for R package # Copyright (C) 2005 The R Foundation # This file is distribute ...

  6. C4.5算法(摘抄)

    1. C4.5算法简介 C4.5是一系列用在机器学习和数据挖掘的分类问题中的算法.它的目标是监督学习:给定一个数据集,其中的每一个元组都能用一组属性值来描述,每一个元组属于一个互斥的类别中的某一类.C ...

  7. scipy.stats

    scipy.stats Scipy的stats模块包含了多种概率分布的随机变量,随机变量分为连续的和离散的两种.所有的连续随机变量都是rv_continuous的派生类的对象,而所有的离散随机变量都是 ...

  8. 小马哥课堂-统计学-t分布

    T distribution 定义 在概率论和统计学中,学生t-分布(t-distribution),可简称为t分布,用于根据小样本来估计 呈正态分布且方差未知的总体的均值.如果总体方差已知(例如在样 ...

  9. Bayesian Statistics for Genetics | 贝叶斯与遗传学

    Common sense reduced to computation - Pierre-Simon, marquis de Laplace (1749–1827) Inventor of Bayes ...

随机推荐

  1. UDS(ISO14229-2006) 汉译(No.0 前言)

    UDS protocol 前言 ISO(国际标准化组织)是国际标准机构(ISO成员体)的世界性联合会.国际标准的拟定工作通常由ISO技术委员会负责.为每一个主题而建立的技术委员会由对其感兴趣的成员机构 ...

  2. Lind.DDD.Manager里菜单权限的设计

    回到目录 对于一个后台管理系统来说,你的权限设计与安全是重中之重,当你为一个权限分配一些菜单后,当这个权限的用户没有菜单权限时,这个菜单的URL是不可以被用户访问的,而在之前的设计中,没有考虑到这点, ...

  3. 十一个行为模式之迭代器模式(Iterator Pattern)

    定义: 提供一种方法来访问聚合对象,而不用暴露这个对象的内部表示.使得存储和遍历两个职责相互分离,提高系统的可扩展性. 结构图: Iterator:抽象迭代器类,定义了访问和遍历元素的接口,例如:ne ...

  4. 深入理解Javascript中构造函数和原型对象的区别

    在 Javascript中prototype属性的详解 这篇文章中,详细介绍了构造函数的缺点以及原型(prototype),原型链(prototype chain),构造函数(constructor) ...

  5. Windows系统变量

    %ALLUSERSPROFILE% : 列出所有用户Profile文件位置. %APPDATA% : 列出应用程序数据的默认存放位置. %CD% : 列出当前目录. %CLIENTNAME% : 列出 ...

  6. HotApp小程序统计云后台 免费的Https云后台服务器,方便学习小程序

    小程序学习有些地方需要后台,比如需要存储数据到服务器,比如微信登录. hotapp有免费的小程序云后台 包含基本的 新增,查询,修改,删除 操作,方便于学习,而且不需要微信appid 也可使用. 小程 ...

  7. ReactiveCocoa代码实践之-更多思考

    三.ReactiveCocoa代码实践之-更多思考 1. RACObserve()宏形参写法的区别 之前写代码考虑过 RACObserve(self.timeLabel , text) 和 RACOb ...

  8. 权重最小生成树的思想与Kruskal算法

    晚上做携程的笔试题,附加题考到了权重最小生成树.OMG,就在开考之前,我还又看过一遍这内容,可因为时间太紧,也从来没有写过代码,就GG了.又吃了眼高手低的亏.这不,就好好总结一下,亡羊补牢. 权重最小 ...

  9. Openstack api 学习文档 & restclient使用文档

    Openstack api 学习文档 & restclient使用文档 转载请注明http://www.cnblogs.com/juandx/p/4943409.html 这篇文档总结一下我初 ...

  10. <<你的灯亮着吗?>>读书笔记

    本书是美国计算机传奇人物杰拉尔德.温伯格和唐纳德.高斯所著,我在网上买到的2003年版的本书,发现本书用20则幽默的现代寓言故事,60幅精美插图,以及一系列的适当提问和建议,让我们的思考方式慢慢得以扩 ...