Sampling Distributions and Central Limit Theorem in R（转）

The Central Limit Theorem (CLT), and the concept of the sampling distribution, are critical for understanding why statistical inference works. There are at least a handful of problems that require you to invoke the Central Limit Theorem on every ASQ Certified Six Sigma Black Belt (CSSBB) exam. The CLT says that if you take many repeated samples from a population, and calculate the averages or sum of each one, the collection of those averages will be normally distributed… and it doesn’t matter what the shape of the source distribution is!

I wrote some R code to help illustrate this principle for my students. This code allows you to choose a sample size (n), a source distribution, and parameters for that source distribution, and generate a plot of the sampling distributions of the mean, sum, and variance. (Note: the sampling distribution for the variance is a Chi-square distribution!)

sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {

   r <- 10000  # Number of replications/samples - DO NOT ADJUST

   # This produces a matrix of observations with

   # n columns and r rows. Each row is one sample:

   my.samples <- switch(src.dist,

	"E" = matrix(rexp(n*r,param1),r),

	"N" = matrix(rnorm(n*r,param1,param2),r),

	"U" = matrix(runif(n*r,param1,param2),r),

	"P" = matrix(rpois(n*r,param1),r),

	"C" = matrix(rcauchy(n*r,param1,param2),r),

        "B" = matrix(rbinom(n*r,param1,param2),r),

	"G" = matrix(rgamma(n*r,param1,param2),r),

	"X" = matrix(rchisq(n*r,param1),r),

	"T" = matrix(rt(n*r,param1),r))

   all.sample.sums <- apply(my.samples,1,sum)

   all.sample.means <- apply(my.samples,1,mean)

   all.sample.vars <- apply(my.samples,1,var)

   par(mfrow=c(2,2))

   hist(my.samples[1,],col="gray",main="Distribution of One Sample")

   hist(all.sample.sums,col="gray",main="Sampling Distributionnof

	the Sum")

   hist(all.sample.means,col="gray",main="Sampling Distributionnof the Mean")

   hist(all.sample.vars,col="gray",main="Sampling Distributionnof

	the Variance")

}

There are 9 population distributions to choose from: exponential (E), normal (N), uniform (U), Poisson (P), Cauchy (C), binomial (B), gamma (G), Chi-Square (X), and the Student’s t distribution (t). Note also that you have to provide either one or two parameters, depending upon what distribution you are selecting. For example, a normal distribution requires that you specify the mean and standard deviation to describe where it’s centered, and how fat or thin it is (that’s two parameters). A Chi-square distribution requires that you specify the degrees of freedom (that’s only one parameter). You can find out exactly what distributions require what parameters by going here:http://en.wikibooks.org/wiki/R_Programming/Probability_Distributions.

Here is an example that draws from an exponential distribution with a mean of 1/1 (you specify the number you want in the denominator of the mean):

sdm.sim(50,src.dist="E",param1=1)

The code above produces this sequence of plots:

You aren’t allowed to change the number of replications in this simulation because of the nature of the sampling distribution: it’s a theoretical model that describes the distribution of statistics from an infinite number of samples. As a result, if you increase the number of replications, you’ll see the mean of the sampling distribution bounce around until it converges on the mean of the population. This is just an artifact of the simulation process: it’s not a characteristic of the sampling distribution, because to be a sampling distribution, you’ve got to have an infinite number of samples. Watkins et al. have a great description of this effect that all statistics instructors should be aware of. I chose 10,000 for the number of replications because 1) it’s close enough to infinity to ensure that the mean of the sampling distribution is the same as the mean of the population, but 2) it’s far enough away from infinity to not crash your computer, even if you only have 4GB or 8GB of memory.

Here are some more examples to try. You can see that as you increase your sample size (n), the shapes of the sampling distributions become more and more normal, and the variance decreases, constraining your estimates of the population parameters more and more.

sdm.sim(10,src.dist="E",1)

sdm.sim(50,src.dist="E",1)

sdm.sim(100,src.dist="E",1)

sdm.sim(10,src.dist="X",14)

sdm.sim(50,src.dist="X",14)

sdm.sim(100,src.dist="X",14)

sdm.sim(10,src.dist="N",param1=20,param2=3)

sdm.sim(50,src.dist="N",param1=20,param2=3)

sdm.sim(100,src.dist="N",param1=20,param2=3)

sdm.sim(10,src.dist="G",param1=5,param2=5)

sdm.sim(50,src.dist="G",param1=5,param2=5)

sdm.sim(100,src.dist="G",param1=5,param2=5)

转自：http://www.r-bloggers.com/sampling-distributions-and-central-limit-theorem-in-r/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29

Sampling Distributions and Central Limit Theorem in R（转）的更多相关文章

Sampling Distribution of the Sample Mean|Central Limit Theorem
7.3 The Sampling Distribution of the Sample Mean population:1000:Scale are normally distributed with ...
加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
【概率论】6-3:中心极限定理(The Central Limit Theorem)
title: [概率论]6-3:中心极限定理(The Central Limit Theorem) categories: - Mathematic - Probability keywords: - ...
Appendix 1- LLN and Central Limit Theorem
1. 大数定律(LLN) 设Y1,Y2,……Yn是独立同分布(iid,independently identically distribution)的随机变量,A = SY /n = (Y1+...+ ...
Law of large numbers and Central limit theorem
大数定律 Law of large numbers (LLN) 虽然名字是 Law,但其实是严格证明过的 Theorem weak law of large number (Khinchin's la ...
中心极限定理（Central Limit Theorem）
中心极限定理:每次从总体中抽取容量为n的简单随机样本,这样抽取很多次后,如果样本容量很大,样本均值的抽样分布近似服从正态分布(期望为 ,标准差为 ). (注:总体数据需独立同分布) 那么样本容量n应 ...
中心极限定理 | central limit theorem | 大数定律 | law of large numbers
每个大学教材上都会提到这个定理,枯燥地给出了定义和公式,并没有解释来龙去脉,导致大多数人望而生畏,并没有理解它的美. <女士品茶>有感待续~ 参考:怎样理解和区分中心极限定理与大数定律?
【转载】Recommendations with Thompson Sampling (Part II)
[原文链接:http://engineering.richrelevance.com/recommendations-thompson-sampling/.] [本文链接:http://www.cnb ...
（main）贝叶斯统计 | 贝叶斯定理 | 贝叶斯推断 | 贝叶斯线性回归 | Bayes' Theorem
2019年08月31日更新看了一篇发在NM上的文章才又明白了贝叶斯方法的重要性和普适性,结合目前最火的DL,会有意想不到的结果. 目前一些最直觉性的理解: 概率的核心就是可能性空间一定,三体世界不会 ...

随机推荐

使用 nvm 来管理nodejs版本。
最近需要升级一下node版本,所以使用nvm搞一搞. 1. 下载 nvm 在 github 下载非安装版本的nvm包https://github.com/coreybutler/nvm-windows ...
跟着刚哥梳理java知识点——数组（七）
数组:数组是多个相同类型数据类型的集合,实现对这些数据的统一管理. 元素:数组中的元素可以是任何数据类型,包括基本数据类型和引用类型. 特点:属于引用类型,数组型数据是对象object,数组中的每个元 ...
web前端概念巩固（一）
h5: 1.web语义化 Web语义化是指在进行HTML结构.表现.行为设计时,尽量使用语义化的标签,使程序代码简介明了,易于进行Web操作和网站SEO,方便团队协作的一种标准,以图实现一种" ...
自动化构建工具gulp简单介绍及使用
一.简介及安装: gulp是前端开发过程中对代码进行构建的工具,是自动化项目的构建利器:她不仅能对网站资源进行优化,而且在开发过程中很多重复的任务能够使用正确的工具自动完成:使用她,我们不仅可以很愉快 ...
Unity 检测物体是否在相机视野范围内
需求: 类似NPC血条,当NPC处于摄像机视野内,血条绘制,且一直保持在NPC头顶. 开始: 网上查找资料,然后编写代码: public RectTransform rectBloodPos; voi ...
tablelayoutpanel内部组件变形
tablelayoutpanel设为dock=full后,最大化或最小化窗口会变形. 解决办法:加入flowlayoutpanel,将tablelayoutpanel放入其中,然后在tablelayo ...
JS中的循环嵌套 BOM函数
[嵌套循环特点] 外层循环转一次,内层循环转一圈外层循环控制行数,内层循环控制每行元素个数 [做 ...
需求收集实例二之 GF Phase 2
GF Phase 2 做B2B的site, 需求收集过程与需求收集过程实例之 - GF Phase 1主要的不同是在phase 1 开发在需求规格文档敲定后开始,而phase 2 把feature ...
React的学习（下）
摘要众所周知,前端三大框架Angular.React.Vue,所以为了跟上时代的步伐,最近开始着手学习React,这时候就发现个大问题,框架一直在更新,提倡的编写语法也在更新,网上有许多教程都是一两 ...
开通阿里云 CDN
CDN,内容分发网络,主要功能是在不同的地点缓存内容,通过负载均衡技术,将用户的请求定向到最合适的缓存服务器上去获取内容,从而加快文件加载速度. 阿里云提供了按量计费的CDN,开启十分方便,于是我在自 ...

Sampling Distributions and Central Limit Theorem in R（转）

Sampling Distributions and Central Limit Theorem in R（转）的更多相关文章

随机推荐

热门专题