The Central Limit Theorem (CLT), and the concept of the sampling distribution, are critical for understanding why statistical inference works. There are at least a handful of problems that require you to invoke the Central Limit Theorem on every ASQ Certified Six Sigma Black Belt (CSSBB) exam. The CLT says that if you take many repeated samples from a population, and calculate the averages or sum of each one, the collection of those averages will be normally distributed… and it doesn’t matter what the shape of the source distribution is!

I wrote some R code to help illustrate this principle for my students. This code allows you to choose a sample size (n), a source distribution, and parameters for that source distribution, and generate a plot of the sampling distributions of the mean, sum, and variance. (Note: the sampling distribution for the variance is a Chi-square distribution!)

sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {
r <- 10000 # Number of replications/samples - DO NOT ADJUST
# This produces a matrix of observations with
# n columns and r rows. Each row is one sample:
my.samples <- switch(src.dist,
"E" = matrix(rexp(n*r,param1),r),
"N" = matrix(rnorm(n*r,param1,param2),r),
"U" = matrix(runif(n*r,param1,param2),r),
"P" = matrix(rpois(n*r,param1),r),
"C" = matrix(rcauchy(n*r,param1,param2),r),
"B" = matrix(rbinom(n*r,param1,param2),r),
"G" = matrix(rgamma(n*r,param1,param2),r),
"X" = matrix(rchisq(n*r,param1),r),
"T" = matrix(rt(n*r,param1),r))
all.sample.sums <- apply(my.samples,1,sum)
all.sample.means <- apply(my.samples,1,mean)
all.sample.vars <- apply(my.samples,1,var)
par(mfrow=c(2,2))
hist(my.samples[1,],col="gray",main="Distribution of One Sample")
hist(all.sample.sums,col="gray",main="Sampling Distributionnof
the Sum")
hist(all.sample.means,col="gray",main="Sampling Distributionnof the Mean")
hist(all.sample.vars,col="gray",main="Sampling Distributionnof
the Variance")
}

There are 9 population distributions to choose from: exponential (E), normal (N), uniform (U), Poisson (P), Cauchy (C), binomial (B), gamma (G), Chi-Square (X), and the Student’s t distribution (t). Note also that you have to provide either one or two parameters, depending upon what distribution you are selecting. For example, a normal distribution requires that you specify the mean and standard deviation to describe where it’s centered, and how fat or thin it is (that’s two parameters). A Chi-square distribution requires that you specify the degrees of freedom (that’s only one parameter). You can find out exactly what distributions require what parameters by going here:http://en.wikibooks.org/wiki/R_Programming/Probability_Distributions.

Here is an example that draws from an exponential distribution with a mean of 1/1 (you specify the number you want in the denominator of the mean):

sdm.sim(50,src.dist="E",param1=1)

The code above produces this sequence of plots:

You aren’t allowed to change the number of replications in this simulation because of the nature of the sampling distribution: it’s a theoretical model that describes the distribution of statistics from an infinite number of samples. As a result, if you increase the number of replications, you’ll see the mean of the sampling distribution bounce around until it converges on the mean of the population. This is just an artifact of the simulation process: it’s not a characteristic of the sampling distribution, because to be a sampling distribution, you’ve got to have an infinite number of samples. Watkins et al. have a great description of this effect that all statistics instructors should be aware of. I chose 10,000 for the number of replications because 1) it’s close enough to infinity to ensure that the mean of the sampling distribution is the same as the mean of the population, but 2) it’s far enough away from infinity to not crash your computer, even if you only have 4GB or 8GB of memory.

Here are some more examples to try. You can see that as you increase your sample size (n), the shapes of the sampling distributions become more and more normal, and the variance decreases, constraining your estimates of the population parameters more and more.

sdm.sim(10,src.dist="E",1)
sdm.sim(50,src.dist="E",1)
sdm.sim(100,src.dist="E",1)
sdm.sim(10,src.dist="X",14)
sdm.sim(50,src.dist="X",14)
sdm.sim(100,src.dist="X",14)
sdm.sim(10,src.dist="N",param1=20,param2=3)
sdm.sim(50,src.dist="N",param1=20,param2=3)
sdm.sim(100,src.dist="N",param1=20,param2=3)
sdm.sim(10,src.dist="G",param1=5,param2=5)
sdm.sim(50,src.dist="G",param1=5,param2=5)
sdm.sim(100,src.dist="G",param1=5,param2=5) 转自:http://www.r-bloggers.com/sampling-distributions-and-central-limit-theorem-in-r/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29

Sampling Distributions and Central Limit Theorem in R(转)的更多相关文章

  1. Sampling Distribution of the Sample Mean|Central Limit Theorem

    7.3 The Sampling Distribution of the Sample Mean population:1000:Scale are normally distributed with ...

  2. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 【概率论】6-3:中心极限定理(The Central Limit Theorem)

    title: [概率论]6-3:中心极限定理(The Central Limit Theorem) categories: - Mathematic - Probability keywords: - ...

  4. Appendix 1- LLN and Central Limit Theorem

    1. 大数定律(LLN) 设Y1,Y2,……Yn是独立同分布(iid,independently identically distribution)的随机变量,A = SY /n = (Y1+...+ ...

  5. Law of large numbers and Central limit theorem

    大数定律 Law of large numbers (LLN) 虽然名字是 Law,但其实是严格证明过的 Theorem weak law of large number (Khinchin's la ...

  6. 中心极限定理(Central Limit Theorem)

    中心极限定理:每次从总体中抽取容量为n的简单随机样本,这样抽取很多次后,如果样本容量很大,样本均值的抽样分布近似服从正态分布(期望为  ,标准差为 ). (注:总体数据需独立同分布) 那么样本容量n应 ...

  7. 中心极限定理 | central limit theorem | 大数定律 | law of large numbers

    每个大学教材上都会提到这个定理,枯燥地给出了定义和公式,并没有解释来龙去脉,导致大多数人望而生畏,并没有理解它的美. <女士品茶>有感 待续~ 参考:怎样理解和区分中心极限定理与大数定律?

  8. 【转载】Recommendations with Thompson Sampling (Part II)

    [原文链接:http://engineering.richrelevance.com/recommendations-thompson-sampling/.] [本文链接:http://www.cnb ...

  9. (main)贝叶斯统计 | 贝叶斯定理 | 贝叶斯推断 | 贝叶斯线性回归 | Bayes' Theorem

    2019年08月31日更新 看了一篇发在NM上的文章才又明白了贝叶斯方法的重要性和普适性,结合目前最火的DL,会有意想不到的结果. 目前一些最直觉性的理解: 概率的核心就是可能性空间一定,三体世界不会 ...

随机推荐

  1. YARN学习总结

    YARN学习总结 前言 YARN(Yet Another Resource Manage,另一种资源协调者)是hadoop-0.23版本引入的的一个新的特性,可以说它是对原有Hadoop Mapred ...

  2. AFNetworking 内部详解

    AFNetworking 是一个适用于IOS 和 Mac OSX 两个平台的网络库,他是在Foundation URL Loading System  基础上进行的一套封装 ,并提供了丰富的API接口 ...

  3. Elasticsearch - 快速入门

    Elasticsearch是基于Apache 2.0开源的实时.分布式.分析搜索引擎,相比Lucene,Elasticsearch的上手比较容易,这篇文章主要纪录Elasticsearch的基本概念和 ...

  4. Lambda表达式随笔

    1.Lambda表达式是一个匿名函数,其本质其实还是一个函数,因此任何一个Lambda表达式都可以以其它的方式通过普通的函数实现或者代替. 2.Lambda表达式云算符:=>,该运算符读为&qu ...

  5. react基于nodejs简单的搭建与开发方法

    只需安装babel命令,即可将react的jsx写法转换成浏览器认识的js写法 1.安装nodejs(百度下载安装即可,自带npm) 2.cmd打开命令行,cd进入在自己的文件夹下 执行命令: npm ...

  6. LeetCode 108: Convert Sorted Array to Binary Search Tree DFS求解

    Given an array where elements are sorted in ascending order, convert it to a height balanced BST. 解题 ...

  7. Linux基础(4)

    Linux基础(四) 通过前面的知识的学习,来现学现卖咯! 1.题目:集群搭建 1.1.部署nginx反向代理三个web服务,调度算法使用加权轮询: 1.2.所有web服务使用共享存储nfs,保证所有 ...

  8. Angularjs快速入门(一)

    这系列是看<用angularjs开发下一代web应用>的笔记. angular也接触几个月,总觉得不甚明白,写起来总是不那么如意.希望这本书看完了可以改变现在的状况.好了废话不多说开始: ...

  9. Python:学会创建并调用函数

    这是关于Python的第4篇文章,主要介绍下如何创建并调用函数. print():是打印放入对象的函数 len():是返回对象长度的函数 input():是让用户输入对象的函数 ... 简单来说,函数 ...

  10. sfdfssd

    [TOC] Disabled options TeX (Based on KaTeX); Emoji; Task lists; HTML tags decode; Flowchart and Sequ ...