加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。
Summary
- Zeros and Ones: Sum of a sample with replacement
$S$ is the number of successes: $n$ independent trials, chance of success on a single trial is $p$ $$E(S)=n\cdot p,\ SE(S)=\sqrt{n\cdot p\cdot(1-p)}$$ Binomial formula: $$P(S=k)=C_{n}^{k}\cdot p^{k}\cdot(1-p)^{n-k}$$ where $k=0, 1, 2, \ldots, n$. R code:dbinom(x = k, size = n, prob = p)
- Zeros and Ones: Sum of a sample without replacement
$S$ is the number of good elements in a simple random sample: $n$ elements drawn from $N=G+B$ elements of which $G$ are good. $$E(S)=n\cdot\frac{G}{N},\ SE(S)=\sqrt{n\cdot\frac{G}{N}\cdot\frac{B}{N}}\cdot\sqrt{\frac{N-n}{N-1}}$$ Hypergeometric formula: $$P(S=g)=\frac{C_{G}^{g}\cdot C_{B}^{n-g}}{C_{N}^{n}}$$ where $g$ is the number of good elements in the sample. R code:dhyper(k = n, m = G, n = B, x = g)
- Zeros and Ones: Sample proportion of ones
$n$ is the sample size, $X$ is the sample proportion of ones. Binomial setting: $$E(X)=p,\ SE(X)=\sqrt{\frac{p\cdot(1-p)}{n}}$$ Hypergeometric setting: $$E(X)=\frac{G}{N},\ SE(X)=\sqrt{\frac{\frac{G}{N}\cdot\frac{B}{N}}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ - Sample sum
Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample sum is $S$, and population size is $N$. With replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma$$ Without replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$ - Sample mean
Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample mean is $M$, and population size is $N$. With replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}$$ Without replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ - Square Root Law
If you multiple the sample size by a factor, the accuracy goes up by the square root of the factor.
PRACTICE
PROBLEM 1
Find the expected value and standard error of
a) your average net gain per bet, if you bet \$1 independently 200 times on “red” at roulette (the bet pays 1 to 1 and the chance of winning is 18/38)
b) the proportion of times you win, if you bet 200 times independently on red as above
c) the total income of a simple random sample of 100 people taken from a population of 5000 people whose average income is \$50,000 with an SD of \$30,000
d) the average income of the sampled people in (c)
e) the number of black cards in a bridge hand (13 cards dealt at random without replacement from a deck consisting of 26 black cards and 26 red cards)
f) the percent of black cards in a bridge hand, described in (e)
Solution
a) Sample mean with replacement. $$E(\text{average net gain})=\mu=1\times\frac{18}{38}+(-1)\times\frac{20}{38}=-\frac{1}{19}\doteq0.05263158$$ $$SE(\text{average net gain})=\frac{SD}{\sqrt{n}}=\frac{\sqrt{E((x-\mu)^2)}}{\sqrt{n}}$$ $$=\frac{\sqrt{(1+\frac{1}{19})\times\frac{18}{38}+(-1+\frac{1}{19})\times\frac{20}{38}}}{\sqrt{200}}\doteq0.07061267$$
b) Sample proportion of ones binomial setting. $$E(\text{proportion of winning times})=p=\frac{18}{38}\doteq0.4736842$$ $$SE(\text{proportion of winning times})=\sqrt{\frac{p\cdot(1-p)}{n}}$$ $$=\sqrt{\frac{\frac{18}{38}\times(1-\frac{18}{38})}{200}}\doteq0.03530634$$
c) Sample sum without replacement. $$E(\text{total income})=n\cdot\mu=100\times50000=5000000$$ $$SE(\text{total income})=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{100}\times30000\times\sqrt{\frac{5000-100}{5000-1}}\doteq 297014.6$$
d) Sample mean without replacement. $$E(\text{average income})=\mu=500000$$ $$SE(\text{average income})=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{30000}{\sqrt{100}}\times\sqrt{\frac{5000-100}{5000-1}}\doteq2970.146$$
e) Sum of a sample without replacement. $$E(\text{black cards in a bridge hand})=n\cdot p=13\times\frac{26}{52}=6.5$$ $$SE(\text{black cards in a bridge hand})=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{13\times\frac{1}{2}\times\frac{1}{2}}\times\sqrt{\frac{52-13}{52-1}}\doteq1.576482$$
f) Sample proportion of ones hypergeometric setting. $$E(\text{proportion of black cards in a bridge hand})=p=\frac{1}{2}$$ $$SE(\text{proportion of black cards in a bridge hand})=\sqrt{\frac{p\cdot(1-p)}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{\frac{\frac{1}{2}\times(1-\frac{1}{2})}{13}}\times\sqrt{\frac{52-13}{52-1}}\doteq0.1212678$$
PROBLEM 2
I play a gambling game repeatedly; the games are independent of each other. In 100 games, my expected average net gain per game is -10 cents, with an SE of 5 cents. In 1000 games, my expected average net gain per game is ________ cents, with an SE of ________ cents.
Solution
The expected value of the net gain will not be changed by increasing the number of playing times. Thus $$E(\text{1000 games})=\mu=-10$$ For $SE$, it will go down when the number of playing games goes up ("square root law"). Thus $$SE(\text{1000 games})=\frac{\sigma}{\sqrt{1000}}=\frac{SE(\text{100 games})\cdot\sqrt{100}}{\sqrt{1000}}\doteq1.581139$$
PROBLEM 3
In a population of tens of thousands of voters, 48% are Democrats. A simple random sample of 125 voters is taken. Approximately what is the chance that a majority of the sampled voters are Democrats?
Solution
Using binomial distribution $n=125, k=63:125, p=0.48$: $$P(\text{majority of 125 sampled voters are Democrats})$$ $$=\sum_{k=63}^{125}C_{125}^{k}\cdot 0.48^k\cdot0.52^{125-k}\doteq0.3269725$$ R code:
sum(dbinom(63:125, 125, 0.48))
[1] 0.3269725
Alternatively, using nomal approximation (sample proportion of ones): $$p=0.48, \sigma=\sqrt{p\cdot(1-p)}$$ $$SE=\frac{\sigma}{\sqrt{125}}, Z=\frac{0.5-p}{SE}$$ Calculating by R:
p = 0.48; sigma = sqrt(p * (1 - p)); se = sigma / sqrt(125)
z = (0.5 - p) / se
1 - pnorm(z)
[1] 0.3272311
The two results are very closer, which is roughly $32.7\%$.
PROBLEM 4
Suppose you are trying to estimate the percent of Democrat voters. Other things being equal, is a simple random sample of 200 voters taken from 100,000 voters about as accurate as a simple random sample of 200 voters taken from 200,000 voters?
Solution
Sample proportion of ones. $$SE(\text{100000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{100000-200}{100000-1}}=0.9990045\cdot\frac{\sigma}{\sqrt{200}}$$ $$SE(\text{200000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{200000-200}{200000-1}}=0.9995024\cdot\frac{\sigma}{\sqrt{200}}$$ Both of the correction factors are very close to 1, thus the accuracy are the same.
UNGRADED EXERCISE SET C
PROBLEM 1
A coin is tossed 2500 times. There is about a 68% chance that the percent of heads is in the range 50% plus or minus? (a percentage)
Solution
$68\%$ is the area between -1 and 1 standard units. So it is $1SE$: $$p=0.5, n=2500$$ $$SE=\sqrt{\frac{p\cdot(1-p)}{n}}=\sqrt{\frac{0.5\times0.5}{2500}}=0.01$$ Thus, there is about $68\%$ chance that the percentage of heads is in the range $50\%$ plus or minus $1\%$.
PROBLEM 2
A simple random sample of 50 students is taken from a class of 300 students. In the class, * the average midterm score is 67 and the $SD$ is 12 * there are 72 women Let $W$ be the number of women in the sample, and let $S$ be the average midterm score of the sampled students.
2A Find $E(W)$.
2B Find $SE(W)$.
2C Find $E(S)$.
2D Find $SE(S)$.
Solution
2A) $$E(W)=50\times\frac{72}{300}=12$$
2B) Sample without replacement. $$N=300, n=50, p=\frac{72}{300}$$ $$SE(W)=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{50\times0.24\times0.76}\times\sqrt{\frac{300-50}{300-1}}\doteq2.761416$$
2C) $$E(S)=\mu=67$$
2D) Sample mean without replacement. $$\sigma=12, n=50, N=300$$ $$SE(S)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{12}{\sqrt{50}}\times\sqrt{\frac{300-50}{300-1}}\doteq1.551782$$
PROBLEM 3
In a city of over 1,000,000 residents, 14% of the residents are senior citizens. In a simple random sample of 1200 residents, there is about a 95% chance that the percent of senior citizens is in the interval [pick the best option; even if you can provide a sharper answer than you see among the choices, please just pick the best among the options] $9\%-19\%$; $10\%-18\%$; $11\%-17\%$; $12\%-16\%$; $13\%-15\%$.
Solution
Firstly, $95\%$ is $2SE$. This is to find sample proportion (using binomial setting since its correction factor is very close to 1): $$E=p=0.14, n=1200$$ $$SE=\frac{p\cdot(1-p)}{\sqrt{n}}=\frac{0.14\times0.86}{\sqrt{1200}}\doteq0.01001665$$ Thus, the interval should be $E\pm2SE=0.14\pm0.02\in[12\%, 16\%]$.
PROBLEM 4
City A has 1,000,000 people; City B has 4,000,000 people. Suppose the goal is to try to predict the percent of Purple Party voters in a sample. Other things being equal, a simple random sample of 1% of the people in City A has about the same accuracy as a simple random sample of ________% of the people in City B. Pick the best option below to fill in the blank.
Solution
For the same accuracy, we need to make the same sample size (not the same proportion!). Thus the percentage of City B should be $$\frac{10^6\times1\%}{4\times10^6}=0.25\%$$
加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples的更多相关文章
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 1 The Two Fundamental Rules (1.5-1.6)
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
随机推荐
- [MetaHook] SearchPattern function
By Nagi void *SearchPattern(void *pStartSearch, DWORD dwSearchLen, char *pPattern, DWORD dwPatternLe ...
- C#操作XML方法集合
一 前言 先来了解下操作XML所涉及到的几个类及之间的关系 如果大家发现少写了一些常用的方法,麻烦在评论中指出,我一定会补上的!谢谢大家 * 1 XMLElement 主要是针对节点的一些属性进行操 ...
- EF 相见恨晚的Attach方法
一个偶然的机会,让我注意了EF 的Attach方法,于是深入了解让我大吃一惊 在我所参与的项目中所有的更新操作与删除操作都是把原对象加载出来后,再做处理,然后再保存到数据库,这样的操作不缺点在于每一次 ...
- 理解Java中的弱引用(Weak Reference)
本篇文章尝试从What.Why.How这三个角度来探索Java中的弱引用,理解Java中弱引用的定义.基本使用场景和使用方法.由于个人水平有限,叙述中难免存在不准确或是不清晰的地方,希望大家可以指出, ...
- 171 Excel Sheet Column Number
/** * 题意:A表示1 B表示2 AA表示27 AB表示28 ------>给你一串字符串输出相应的数字 * 分析:这个就类似于二进制转十进制,从字符串后面往前遍历,然后pow(26,n)* ...
- 准确率P 召回率R
Evaluation metricsa binary classifier accuracy,specificity,sensitivety.(整个分类器的准确性,正确率,错误率)表示分类正确:Tru ...
- JavaScript学习笔记-new Date() 与 Date() 的区别
var today1 = Date() //返回一个字符串(string),没有getDate等日期对象方法,内容为当前时间 var today2 = new Date() //返回一日期对象,内容为 ...
- JavaScript学习笔记-循环输出菱形,并可菱形自定义大小
var Cen = 6;//定义菱形中部为第几行(起始值为0) //for循环输出菱形 document.write("<button onclick='xh()'>点我for循 ...
- spring mvc处理静态资源
servlet的url映射定义为'/'表示映射全部路径 struts的过滤器是*.action,在spring mvc中设置成*.action或者*.do......也是可以的,但是spring mv ...
- Day Two(Beta)
站立式会议 站立式会议内容总结 331 今天:指导队友学会xml布局及简单动画,解决了关于中文链接过滤器不能将iso编码改为utf8的情况(修改servletContainer默认编码) 遇到的问题: ...