加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。
Summary
- Standard Error
The standard error of a random variable $X$ is defined by $$SE(X)=\sqrt{E((X-E(X))^2)}$$ $SE$ measures the rough size of the chance error in $X$: roughly how far off $X$ is from $E(X)$. - Standard Deviation
The standard deviation of a list of numbers is $$SD=\sqrt{E((x-\mu)^2)}$$ where $\mu=E(x)$. $SD$ measures the rough size of the deviations: roughly how far off the numbers are from the average. - $SE$ of the Sum of the Draws
$n$ draws at random with replacement from a box of numbered tickets, the standard error of the sum of the draw is $$SE=\sqrt{\text{number of draws}}\cdot(SD\ \text{of the box})=\sqrt{n}\cdot\sigma$$ where $\sigma=\sqrt{E((x-\mu)^2)}$ - Chebychev's Inequality
The probability that $X$ is $k$ or more $SEs$ away from $E(X)$ is at most $\frac{1}{k^2}$, that is $$P(X\ \text{is outside the interval}\ E(X)\pm k\cdot SE(X))\leq\frac{1}{k^2}$$ For instance, $$P(X\ \text{is inside the interval}\ E(X)\pm2\cdot SE(X))\geq1-\frac{1}{2^2}=\frac{3}{4}$$ - De Moivre - Laplace Theorem
Fix any $p$ strictly between $0$ and $1$. As the number of trials $n$ increases, the probability histogram for the binomial distribution looks like the normal curve with mean $\mu=n\cdot p$ and $SD=\sqrt{n\cdot p\cdot(1-p)}$. - Central Limit Theorem
Let $X_1, X_2, \ldots, X_n$ be independent and identically distributed, each with expected value $\mu$ and standard error $\sigma$. Let $S_n=X_1+X_2+\ldots+X_n$. Then for large $n$, the probability distribution of $S_n$ is approximately normal with mean $n\mu$ and standard deviation $\sqrt{n}\sigma$, no matter what the distribution of each $X_i$. - Normal Approximation of Binomial Distribution
$$\mu=n\cdot p, SE=\sqrt{n\cdot p\cdot(1-p)}$$ $$Z_1=\frac{X_1-\mu}{SE}, Z_2=\frac{X_2-\mu}{SE}$$ $$P(X_1\leq X\leq X_2)=\text{Area under the standard normal curve between}\ X_1,X_2 $$ R code:mu = n * p; se = sqrt(n * p * (1 - p))
z1 = (x1 - mu) / se; z2 = (x2 - mu) / se
pnorm(z2) - pnorm(z1)
PRACTICE
PROBLEM 1
In 6000 rolls of a die, approximately what is the chance of getting between 950 and 1050 sixes (inclusive)?
Solution
Binomial distribution $n=6000, k=950:1050, p=1/6$: $$P(\text{between 950 and 1050 sixes})$$ $$=\sum_{k=950}^{1050}C_{6000}^{k}(\frac{1}{6})^k\cdot(\frac{5}{6})^{6000-k}\doteq0.9198021$$ R code:
sum(dbinom(x = 950:1050, size = 6000, p = 1/6))
[1] 0.9198021
Alternatively, using Normal Approximation: $$\mu=np=6000\times\frac{1}{6}=1000$$ $$SE=\sqrt{n\cdot p\cdot(1-p)}\doteq28.86751$$ $$Z_1=\frac{950-1000}{SE}, Z_2=\frac{1050-1000}{SE}$$ $$P(\text{between 950 and 1050 sixes})$$ $$=\text{Area under the standard normal curve between}\ Z_1\ \text{and}\ Z_2$$ $$=0.9167355$$ R code:
n = 6000; p = 1/6
mu = n * p; se = sqrt(n * p * (1 - p))
z1 = (950 - mu) / se; z2 = (1050 - mu) / se
pnorm(z2) - pnorm(z1)
[1] 0.9167355
PROBLEM 2
The “column” bet in roulette pays 2 to 1 and there are 12 chances in 38 to win. Suppose you bet \$1 100 times independently on a column. Find
a) the expected number of times you win
b) the SE of the number of times you win
c) the expected value of your net gain
d) the $SE$ of your net gain
e) the chance that you come out ahead
Solution
2a) $$E(\text{times of win})=100\times\frac{12}{38}\doteq31.57895$$
2b) $$SE=\sqrt{n\cdot p\cdot(1-p)}=\sqrt{100\times\frac{12}{38}\times\frac{26}{38}}\doteq4.648295$$
2c) $$E(\text{net gain})=100\times(2\times\frac{12}{38}+(-1)\times\frac{26}{38})\doteq-5.263158$$ Alternatively, Let $W$ be the number of wins and $X$ the net gain. Then $$X=2\cdot W-1\cdot(100-W)=3\cdot W-100$$ $$E(X)=3\cdot E(W)-100=3\times31.579895-100=-5.26315$$
2d) Because $SE=\sqrt{n}\sigma$ and $$n=100, \mu=2\times\frac{12}{38}+(-1)\times\frac{26}{38}=-\frac{1}{19}$$ $$\sigma=\sqrt{E((X-\mu)^2)}=\sqrt{(2+\frac{1}{19})^2\times\frac{12}{38}+(-1+\frac{1}{19})^2\times\frac{26}{38}}\doteq1.394489$$ Thus $$SE=\sqrt{n}\sigma\doteq13.94489$$ Alternatively, $$SE(X)=3\cdot SE(W)=3\times4.6483=13.945$$
2e) $X > 0 \Rightarrow W > \frac{100}{3}\Rightarrow W \geq 34$. Binomial distribution $n=100, k=34:100, p=12/38$: $$\sum_{k=34}^{100}C_{100}^{k}\cdot(\frac{12}{38})^k\cdot(\frac{26}{38})^{100-k}\doteq0.3357928$$ R code:
sum(dbinom(x = 34:100, size = 100, p = 12/38))
[1] 0.3357928
PROBLEM 3
Find the normal approximation to the chance of getting 43 heads in 100 tosses of a coin.
Solution
Normal Approximation: $$\mu=100\times0.5=50, SE=\sqrt{n\cdot p\cdot(1-p)}=\sqrt{100\times0.5\times0.5}=5$$ $$Z_1=\frac{42.5-50}{5}, Z_2=\frac{43.5-50}{5}$$ $$P(\text{getting 43 heads in 100 tosses of a coin})\doteq0.02999328$$ R code:
n = 100; p = 1/2
mu = n * p; se = sqrt(n * p * (1 - p))
z1 = (42.5 - mu) / se; z2 = (43.5 - mu) / se
pnorm(z2) - pnorm(z1)
[1] 0.02999328
Binomial distribution (exact value): $$C_{100}^{43}\times(\frac{1}{2})^{100}\doteq0.03006864$$ R code:
dbinom(x = 43, size = 100, p = 1/2)
[1] 0.03006864
Therefore the normal approximation is excellent.
EXERCISE 4
PROBLEM 1
A random variable $W$ has the probability distribution
value 1 2 3 4
probability 0.5 0.25 0.125 0.125
(For those of you who are interested, this is the geometric $p=0.5$ “killed” at 4. $W$ is the number of times I toss a coin if I follow this rule: I’ll toss the coin till I get the first head, but I’ll stop after 4 tosses even if I haven’t got a head by that time.)
1A Find $E(W)$
1B Find $SE(W)$
Solution
1A) $$E(W)=1\times0.5+2\times0.25+3\times0.125+4\times0.125=1.875$$
1B) $$SE(W)=\sqrt{E[(W-E(W))^2]}$$ $$=\sqrt{(1-1.875)^2\times0.5+(2-1.875)^2\times0.25+(3-1.875)^2\times0.125+(4-1.875)^2\times0.125}$$ $$\doteq1.053269$$ R code:
v = 1:4; p = c(.5, .25, .125, .125)
mu = sum(v * p)
sqrt(sum((v - mu) ^ 2 * p))
[1] 1.053269
PROBLEM 2
A true-false test consists of 20 questions, each of which has one correct answer: true, or false. One point is awarded for every correct answer, but one point is taken off for each wrong answer. Suppose a student answers every question by guessing at random, independently of other questions. Let $S$ be the student’s score on the test.
2A Find $E(S)$
2B Find $SE(S)$
2C Find $P(S=0)$ without using a large-sample approximation.
Solution
2A) This is very similar to the net gain, $$E(S)=20\times(1\times\frac{1}{2}+(-1)\times\frac{1}{2})=0$$
2B) $S$ is the sum score, $$\mu=1\times\frac{1}{2}+(-1)\times\frac{1}{2}=0$$ $$SE(S)=\sqrt{n}\sigma=\sqrt{20\times((1-0)^2\times\frac{1}{2}+(-1-0)^2\times\frac{1}{2})}\doteq4.472136$$
2C) $S=0$ means there are 10 correct answers and 10 incorrect answers, binomial distribution $n=20, k=10, p=\frac{1}{2}$, $$P(S=0)=C_{20}^{10}\times(\frac{1}{2})^{20}\doteq0.1761971$$ R code:
dbinom(x = 10, size = 20, prob = 1/2)
[1] 0.1761971
PROBLEM 3
A die is rolled 60 times.
3A Find the expected number of times the face with 6 spots appears.
3B Find the $SE$ of the number of times the face with 6 spots appears.
3C Find the normal approximation to the chance that the face with six spots appears 10 times.
3D Find the exact chance that the face with six spots appears 10 times.
3E Find the normal approximation to the chance that the face with six spots appears 9, 10, or 11 times.
3F Find the exact chance that the face with six spots appears 9, 10, or 11 times.
Solution
3A) $$E(\text{6 spots appears})=60\times\frac{1}{6}=10$$
3B) $$SE(\text{6 spots appears})=\sqrt{60\times\frac{1}{6}\times(1-\frac{1}{6})}\doteq2.886751$$
3C) $$Z_1=\frac{9.5-10}{SE}, Z_2=\frac{10.5-10}{SE}$$ Computing in R:
mu = 10; se = sqrt(60 * 1/6 * 5/6)
z1 = (9.5 - mu) / se; z2 = (10.5 - mu) / se
pnorm(z2) - pnorm(z1)
[1] 0.1375098
3D) Binomial distribution $n=60, k=10, p=\frac{1}{6}$: $$C_{60}^{10}\times(\frac{1}{6})^{10}\times(\frac{5}{6})^{50}\doteq0.1370131$$ R code:
dbinom(x = 10, size = 60, prob = 1/6)
[1] 0.1370131
3E) $$Z_1=\frac{8.5-10}{SE}, Z_2=\frac{11.5-10}{SE}$$ Computing in R:
mu = 10; se = sqrt(60 * 1/6 * 5/6)
z1 = (8.5 - mu) / se; z2 = (11.5 - mu) / se
pnorm(z2) - pnorm(z1)
[1] 0.3966682
3F) Binomial distribution $n=60, k=9:11, p=\frac{1}{6}$: $$\sum_{k=9}^{11}C_{60}^{k}\cdot(\frac{1}{6})^{k}\cdot(\frac{5}{6})^{60-k}\doteq0.3958971$$ R code:
sum(dbinom(x = 9:11, size = 60, prob = 1/6))
[1] 0.3958971
PROBLEM 4
According to genetic theory, plants of a particular species have a 25% chance of being red-flowering, independently of other plants. Find the normal approximation to the chance that among 10,000 plants of this species, more than 2400 are red-flowering.
Solution
Normal approximation: $$p=0.25, n=10000$$ $$\mu=np, SE=\sqrt{np(1-p)}, Z=\frac{2400.5-\mu}{SE}$$ Computing in R:
n = 10000; p = 0.25
mu = n * p; se = sqrt(n * p * (1 - p))
z = (2400.5 - mu) / se
1 - pnorm(z)
[1] 0.989215
Binomial distribution $$\sum_{k=2401}^{10000}C_{10000}^{k}\cdot(0.25)^k\cdot(0.75)^{10000-k}$$ R code:
sum(dbinom(x = 2401:10000, size = 10000, prob = 0.25))
[1] 0.9894525
PROBLEM 5
A random number generator draws at random with replacement from the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. In 5000 draws, the chance that the digit 0 appears fewer than 495 times is closest to
Solution
Normal approximation: $$n=5000, p=0.1$$ $$\mu=np, SE=\sqrt{np(1-p)}, Z=\frac{494.5-\mu}{SE}$$ Computing in R:
mu = n * p; se = sqrt(n * p * (1 - p))
z = (494.5 - mu) / se
pnorm(z)
[1] 0.3977125
Binomial distribution $$\sum_{k=0}^{494}C_{5000}^{k}\cdot(0.1)^k\cdot(0.9)^{5000-k}$$ R code:
sum(dbinom(x = 0:494, size = 5000, prob = 0.1))
[1] 0.3999814
EXERCISE 5
PROBLEM 1
The durations of phone calls taken by the receptionist at an office are like draws made at random with replacement from a list that has an average of 8.5 minutes (that's 8 minutes and 30 seconds) and an $SD$ of 3 minutes. Approximately what is the chance that the total duration of the next 100 calls is more than 15 hours?
Solution
Central Limit Theorem: $$\mu=8.5, SD=3, SE=\sqrt{n}\cdot SD=30$$ $$Z=\frac{900-850}{30}$$ Computing in R:
z = (900 - 850) / 30
1 - pnorm(z)
[1] 0.04779035
PROBLEM 2
A multiple choice test consists of 100 questions. Each question has 5 possible answers, only one of which is correct. Four points are awarded for each correct answer, and 1 point is taken off for each wrong answer. Suppose you answer all the questions by guessing at random, independently of all other questions.
2A In order to score more than 30 points, you have to get more than ________ answers right. Fill in the blank with the smallest correct whole number.
2B What is the chance that you get more than 30 points?
Solution
2A) Let $x$ be the number of correct answers, we have $$4x+(-1)\cdot(100-x) > 30\Rightarrow x > 26$$ Therefore you have to get more than 26 answers right.
2B) Binomial distribution $n=100, k=27:100, p=\frac{1}{5}$: $$P(\text{more than 30 points})=\sum_{k=27}^{100}C_{100}^{k}\cdot(\frac{1}{5})^k\cdot(\frac{4}{5})^{100-k}\doteq0.05583272$$ R code:
sum(dbinom(x = 27:100, size = 100, prob = 1/5))
[1] 0.05583272
Normal approximation: $$n=100, p=\frac{1}{5}, \mu=np=20, SE=\sqrt{np(1-p)}=4$$ $$Z=\frac{26.5-20}{SE}$$ Computing in R:
z = (26.5 - 20) / 4
> 1 - pnorm(z)
[1] 0.05208128
This approximation is not sufficient good.
PROBLEM 3
Assume that each person in a population has chance 2/1000 of carrying a particular disease, independently of all other people. Among 1000 people in this population, the number of people that carry the disease [pick all that are correct]
Solution
First, this is binomial distribution. Second, because $p$ is very small so it is right-skewed.
PROBLEM 4
Jack and Jill gamble on a roll of a die (yes, a fair die), as follows. If the die shows 1 or 2 spots, Jack gives Jill $\$1$. If the die shows 5 or 6 spots, Jill gives Jack $\$1$. If the die shows 3 or 4 spots, no money changes hands. Suppose Jack and Jill play this game 400 times. The chance that Jill’s net gain is more than $\$20$ is closest to?
Solution
$$P(\text{Jill wins 1})=P(\text{Jill loses 1})=P(\text{no money changes hands})=\frac{1}{3}$$ $$\mu=1\times\frac{1}{3}+(-1)\times\frac{1}{3}+0\times\frac{1}{3}=0$$ $$SD=\sqrt{(1-0)^2\times\frac{1}{3}+(-1-0)^2\times\frac{1}{3}+(0-0)^2\times\frac{1}{3}}=\sqrt{\frac{2}{3}}$$ $$SE=\sqrt{n}\cdot SD=\sqrt{\frac{800}{3}}, Z=\frac{20-0}{SE}$$ Computing in R:
se = sqrt(800 / 3)
z = (20 - 0) / se
1 - pnorm(z)
[1] 0.1103357
PROBLEM 5
In roulette, the bet on a “split” pays 17 to 1 and there are 2 chances in 38 to win. The bet on “red” pays 1 to 1 and there are 18 chances in 38 to win. Compare the following two strategies: A: bet $\$1$ on a split, 200 times independently B: bet $\$1$ on red, 200 times independently In what follows, “making more than $\$x$” means having a net gain of more than $\$x$; “losing more than $\$x$” means having a net gain of less than $-\$x$. Compare the chances between A and B that "coming out ahead, winning more than $\$20$, losing more than $\$20$".
Solution
By using Central Limit Theorem.
Let $P_{X0}$ be "coming out ahead" when following strategy $X$. Similarly, $P_{X20^{+}}$ and $P_{X20^{-}}$ denotes wining and losing $\$20$ respectively. Strategy $A$: $$n=200, \mu=200\times(17\times\frac{2}{38}+(-1)\times\frac{36}{38})=-\frac{200}{19}$$ $$SE=\sqrt{n}\cdot SD=\sqrt{200\times[(17-\mu)^2\times\frac{2}{38}+(-1-\mu)^2\times\frac{36}{38}]}$$ Similarly, we can calculate strategy $B$ in the same way. And finally computing in R:
netgain = function(n, prob, value, gain){
mu = n * (sum(prob * value))
se = sqrt(n * sum((value - mu) ^ 2 * prob))
if (gain >= 0){
z = (gain + 0.5 - mu) / se
print(1 - pnorm(z))
} else {
z = (gain - 0.5 - mu) / se
print(pnorm(z))
}
}
netgain(n = 200, prob = c(2/38, 36/38), value = c(17, -1), gain = 0)
[1] 0.4722959 # A
netgain(n = 200, prob = c(18/38, 20/38), value = c(1, -1), gain = 0)
[1] 0.4704632 # B
netgain(n = 200, prob = c(2/38, 36/38), value = c(17, -1), gain = 20)
[1] 0.4224767 # A
netgain(n = 200, prob = c(18/38, 20/38), value = c(1, -1), gain = 20)
[1] 0.4174109 # B
netgain(n = 200, prob = c(2/38, 36/38), value = c(17, -1), gain = -20)
[1] 0.474937 # A
netgain(n = 200, prob = c(18/38, 20/38), value = c(1, -1), gain = -20)
[1] 0.4732785 # B
According to the results above, $$P_{A0} > P_{B0}$$ $$P_{A20^{+}} > P_{B20^{+}}$$ $$P_{A20^{-}} > P_{B20^{-}}$$ That is, $P_A > P_B$ when
- Coming out ahead
- Winning more than $\$20$
- Losing more than $\$20$
加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem的更多相关文章
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 1 The Two Fundamental Rules (1.5-1.6)
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
随机推荐
- React入门 (1)—使用指南(包括ES5和ES6对比)
前言 本篇会简明扼要的介绍一下React的使用方法.代码会用JSX+ES5和JSX+ES6两种方式实现. React简介 React来自Facebook,于2013年开源.至今不断修改完善,现在已经到 ...
- 一个C++宏定义与枚举定义重复的编译错误
C++的开发效率低是众所周知的,原因比如有: 语言复杂度高 编译效率低 工具链不够完整高效(尤其是linux下) 另外一个恐怕是不少编译错误让人摸不着头脑,今天碰到一个,举个例子: #include ...
- Openwrt iptables分析
这里将载有Openwrt的WR841N的路由表dump出来分析一下. 这个是dump出iptables的命令 root@OpenWrt:/etc/config# iptables-save 这里分为4 ...
- Qt中的qreal
在桌面操作系统中(比如Windows, XNix等)qreal其实就是double类型:而在嵌入设备系统中,qreal则等同于float 类型.
- 我做PHP,但是我要批判下整天唱衰.NET的淫
笔者每天都能看到月经贴-".NET已死"!!! 笔者之前一直在CSDN上面写博客,泡论坛,但是有约莫一年来着了发现CSDN上面的博客都没啥更新,CSDN首页推荐的一些文章也没啥新意 ...
- 对react的几点质疑
现在react.js如火如荼,非常火爆,昨天抽了一天来看了下这项技术.可能就看了一天,研究的不深入,但是我在看的过程中发现来了很多疑惑,这里拿出来和那家分享讨论以此共勉. 在我接触的前端以后,让我感觉 ...
- Lucene.Net的服务器封装+APi组件 (开源)
为什么要封装 真不知道用什么标题合适,我这几天在研究Lucene.Net,觉得把Lucene.Net封装为一个独立的服务器,再提供一个给客户端调用的Api组件应该是一件很意思的事,主要优势有以下: 1 ...
- NodeJs爬虫—“眼睛好看是一种什么样的体验?”
逛知乎的时候经常看见有好多的福利贴(钓鱼贴),这不最近又让我发现了一个——眼睛好看是一种什么样的体验是一种怎么样的体验呢?我决定把答案里的照片都下到我的电脑里好好体验一下,怎么做呢,一张一张下好麻烦, ...
- ASP.NET中实现Session的负载均衡
据我目前所知有2种方法,如下: 1.利用微软提供的解决方案 参考网址:http://blog.maartenballiauw.be/post/2008/01/23/ASPNET-Session-Sta ...
- MyBatis学习--SqlMapConfig.xml配置文件
简介 SqlMapConfig.xml是MyBatis的全局配置文件,在前面的文章中我们可以看出,在SqlMapConfig.xml主要是配置了数据源.事务和映射文件,其实在SqlMapConfig. ...