Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

  • Zeros and Ones: Sum of a sample with replacement
    $S$ is the number of successes: $n$ independent trials, chance of success on a single trial is $p$ $$E(S)=n\cdot p,\ SE(S)=\sqrt{n\cdot p\cdot(1-p)}$$ Binomial formula: $$P(S=k)=C_{n}^{k}\cdot p^{k}\cdot(1-p)^{n-k}$$ where $k=0, 1, 2, \ldots, n$. R code:

    dbinom(x = k, size = n, prob = p)
  • Zeros and Ones: Sum of a sample without replacement
    $S$ is the number of good elements in a simple random sample: $n$ elements drawn from $N=G+B$ elements of which $G$ are good. $$E(S)=n\cdot\frac{G}{N},\ SE(S)=\sqrt{n\cdot\frac{G}{N}\cdot\frac{B}{N}}\cdot\sqrt{\frac{N-n}{N-1}}$$ Hypergeometric formula: $$P(S=g)=\frac{C_{G}^{g}\cdot C_{B}^{n-g}}{C_{N}^{n}}$$ where $g$ is the number of good elements in the sample. R code:
    dhyper(k = n, m = G, n = B, x = g)
  • Zeros and Ones: Sample proportion of ones
    $n$ is the sample size, $X$ is the sample proportion of ones. Binomial setting: $$E(X)=p,\ SE(X)=\sqrt{\frac{p\cdot(1-p)}{n}}$$ Hypergeometric setting: $$E(X)=\frac{G}{N},\ SE(X)=\sqrt{\frac{\frac{G}{N}\cdot\frac{B}{N}}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Sample sum
    Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample sum is $S$, and population size is $N$. With replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma$$ Without replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Sample mean
    Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample mean is $M$, and population size is $N$. With replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}$$ Without replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Square Root Law
    If you multiple the sample size by a factor, the accuracy goes up by the square root of the factor.

PRACTICE

PROBLEM 1

Find the expected value and standard error of

a) your average net gain per bet, if you bet \$1 independently 200 times on “red” at roulette (the bet pays 1 to 1 and the chance of winning is 18/38)

b) the proportion of times you win, if you bet 200 times independently on red as above

c) the total income of a simple random sample of 100 people taken from a population of 5000 people whose average income is \$50,000 with an SD of \$30,000

d) the average income of the sampled people in (c)

e) the number of black cards in a bridge hand (13 cards dealt at random without replacement from a deck consisting of 26 black cards and 26 red cards)

f) the percent of black cards in a bridge hand, described in (e)

Solution

a) Sample mean with replacement. $$E(\text{average net gain})=\mu=1\times\frac{18}{38}+(-1)\times\frac{20}{38}=-\frac{1}{19}\doteq0.05263158$$ $$SE(\text{average net gain})=\frac{SD}{\sqrt{n}}=\frac{\sqrt{E((x-\mu)^2)}}{\sqrt{n}}$$ $$=\frac{\sqrt{(1+\frac{1}{19})\times\frac{18}{38}+(-1+\frac{1}{19})\times\frac{20}{38}}}{\sqrt{200}}\doteq0.07061267$$

b) Sample proportion of ones binomial setting. $$E(\text{proportion of winning times})=p=\frac{18}{38}\doteq0.4736842$$ $$SE(\text{proportion of winning times})=\sqrt{\frac{p\cdot(1-p)}{n}}$$ $$=\sqrt{\frac{\frac{18}{38}\times(1-\frac{18}{38})}{200}}\doteq0.03530634$$

c) Sample sum without replacement. $$E(\text{total income})=n\cdot\mu=100\times50000=5000000$$ $$SE(\text{total income})=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{100}\times30000\times\sqrt{\frac{5000-100}{5000-1}}\doteq 297014.6$$

d) Sample mean without replacement. $$E(\text{average income})=\mu=500000$$ $$SE(\text{average income})=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{30000}{\sqrt{100}}\times\sqrt{\frac{5000-100}{5000-1}}\doteq2970.146$$

e) Sum of a sample without replacement. $$E(\text{black cards in a bridge hand})=n\cdot p=13\times\frac{26}{52}=6.5$$ $$SE(\text{black cards in a bridge hand})=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{13\times\frac{1}{2}\times\frac{1}{2}}\times\sqrt{\frac{52-13}{52-1}}\doteq1.576482$$

f) Sample proportion of ones hypergeometric setting. $$E(\text{proportion of black cards in a bridge hand})=p=\frac{1}{2}$$ $$SE(\text{proportion of black cards in a bridge hand})=\sqrt{\frac{p\cdot(1-p)}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{\frac{\frac{1}{2}\times(1-\frac{1}{2})}{13}}\times\sqrt{\frac{52-13}{52-1}}\doteq0.1212678$$

PROBLEM 2

I play a gambling game repeatedly; the games are independent of each other. In 100 games, my expected average net gain per game is -10 cents, with an SE of 5 cents. In 1000 games, my expected average net gain per game is ________ cents, with an SE of ________ cents.

Solution

The expected value of the net gain will not be changed by increasing the number of playing times. Thus $$E(\text{1000 games})=\mu=-10$$ For $SE$, it will go down when the number of playing games goes up ("square root law"). Thus $$SE(\text{1000 games})=\frac{\sigma}{\sqrt{1000}}=\frac{SE(\text{100 games})\cdot\sqrt{100}}{\sqrt{1000}}\doteq1.581139$$

PROBLEM 3

In a population of tens of thousands of voters, 48% are Democrats. A simple random sample of 125 voters is taken. Approximately what is the chance that a majority of the sampled voters are Democrats?

Solution

Using binomial distribution $n=125, k=63:125, p=0.48$: $$P(\text{majority of 125 sampled voters are Democrats})$$ $$=\sum_{k=63}^{125}C_{125}^{k}\cdot 0.48^k\cdot0.52^{125-k}\doteq0.3269725$$ R code:

sum(dbinom(63:125, 125, 0.48))
[1] 0.3269725

Alternatively, using nomal approximation (sample proportion of ones): $$p=0.48, \sigma=\sqrt{p\cdot(1-p)}$$ $$SE=\frac{\sigma}{\sqrt{125}}, Z=\frac{0.5-p}{SE}$$ Calculating by R:

p = 0.48; sigma = sqrt(p * (1 - p)); se = sigma / sqrt(125)
z = (0.5 - p) / se
1 - pnorm(z)
[1] 0.3272311

The two results are very closer, which is roughly $32.7\%$.

PROBLEM 4

Suppose you are trying to estimate the percent of Democrat voters. Other things being equal, is a simple random sample of 200 voters taken from 100,000 voters about as accurate as a simple random sample of 200 voters taken from 200,000 voters?

Solution

Sample proportion of ones. $$SE(\text{100000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{100000-200}{100000-1}}=0.9990045\cdot\frac{\sigma}{\sqrt{200}}$$ $$SE(\text{200000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{200000-200}{200000-1}}=0.9995024\cdot\frac{\sigma}{\sqrt{200}}$$ Both of the correction factors are very close to 1, thus the accuracy are the same.

UNGRADED EXERCISE SET C

PROBLEM 1

A coin is tossed 2500 times. There is about a 68% chance that the percent of heads is in the range 50% plus or minus? (a percentage)

Solution

$68\%$ is the area between -1 and 1 standard units. So it is $1SE$: $$p=0.5, n=2500$$ $$SE=\sqrt{\frac{p\cdot(1-p)}{n}}=\sqrt{\frac{0.5\times0.5}{2500}}=0.01$$ Thus, there is about $68\%$ chance that the percentage of heads is in the range $50\%$ plus or minus $1\%$.

PROBLEM 2

A simple random sample of 50 students is taken from a class of 300 students. In the class, * the average midterm score is 67 and the $SD$ is 12 * there are 72 women Let $W$ be the number of women in the sample, and let $S$ be the average midterm score of the sampled students.

2A Find $E(W)$.

2B Find $SE(W)$.

2C Find $E(S)$.

2D Find $SE(S)$.

Solution

2A) $$E(W)=50\times\frac{72}{300}=12$$

2B) Sample without replacement. $$N=300, n=50, p=\frac{72}{300}$$ $$SE(W)=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{50\times0.24\times0.76}\times\sqrt{\frac{300-50}{300-1}}\doteq2.761416$$

2C) $$E(S)=\mu=67$$

2D) Sample mean without replacement. $$\sigma=12, n=50, N=300$$ $$SE(S)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{12}{\sqrt{50}}\times\sqrt{\frac{300-50}{300-1}}\doteq1.551782$$

PROBLEM 3

In a city of over 1,000,000 residents, 14% of the residents are senior citizens. In a simple random sample of 1200 residents, there is about a 95% chance that the percent of senior citizens is in the interval [pick the best option; even if you can provide a sharper answer than you see among the choices, please just pick the best among the options] $9\%-19\%$; $10\%-18\%$; $11\%-17\%$; $12\%-16\%$; $13\%-15\%$.

Solution

Firstly, $95\%$ is $2SE$. This is to find sample proportion (using binomial setting since its correction factor is very close to 1): $$E=p=0.14, n=1200$$ $$SE=\frac{p\cdot(1-p)}{\sqrt{n}}=\frac{0.14\times0.86}{\sqrt{1200}}\doteq0.01001665$$ Thus, the interval should be $E\pm2SE=0.14\pm0.02\in[12\%, 16\%]$.

PROBLEM 4

City A has 1,000,000 people; City B has 4,000,000 people. Suppose the goal is to try to predict the percent of Purple Party voters in a sample. Other things being equal, a simple random sample of 1% of the people in City A has about the same accuracy as a simple random sample of ________% of the people in City B. Pick the best option below to fill in the blank.

Solution

For the same accuracy, we need to make the same sample size (not the same proportion!). Thus the percentage of City B should be $$\frac{10^6\times1\%}{4\times10^6}=0.25\%$$

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples的更多相关文章

  1. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 1 The Two Fundamental Rules (1.5-1.6)

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. swifttextfield代理方法

    //MARK:textfield delegate //键盘的高度 func textFieldShouldBeginEditing(textField: UITextField) -> Boo ...

  2. Webwork 学习之路【04】Configuration 详解

    Webwork做为经典的Web MVC 框架,个人觉得源码中配置文件这部分代码的实现十分考究. 支持自定义自己的配置文件.自定义配置文件读取类.自定义国际化支持. 可以作为参考,单独引入到其他项目中, ...

  3. GitHub中国区前100名到底是什么样的人?

    本文根据Github公开API,抓取了地址显示China的用户,根据粉丝关注做了一个排名,分析前一百名的用户属性,剖析这些活跃在技术社区的牛人到底是何许人也!后续会根据我的一些经验出品<技术人员 ...

  4. [BZOJ1193][HNOI2006]马步距离(贪心+dfs)

    题目:http://www.lydsy.com:808/JudgeOnline/problem.php?id=1193 分析: 首先小范围可以直接暴力.(其实只要用上题目中的表就行了) 如果范围比较大 ...

  5. jQuery——$(function(){});与$(document).ready(function(){});的区别

    只要在我们的jsp页面中写上 <script> $(function(){ //内容 }); </script> 则,函数中的内容就会在jsp页面被载入的时候就被执行,实际上, ...

  6. 局域网IP段

    局域网的出现,一方面解决内部安全问题,另一个方面解决ipv4不够用的问题.局域网方便维护和管理,目前局域网Ip地址段为 局域网地址范围分三类: C类:192.168.0.0-192.168.255.2 ...

  7. MVC认知路【点点滴滴支离破碎】【一】----新建数据库

    1.App_Data文件夹创建[SQL Server Compact Local Database *]数据库 2.添加链接字符串<add name="MovieDBContext&q ...

  8. UITextView的使用详解

    //初始化并定义大小 UITextView *textview = [[UITextView alloc] initWithFrame:CGRectMake(20, 10, 280, 30)]; te ...

  9. parse date receiving from mvc jsonresult

    if we received data like this: ,"Date":"\/Date(1410969600000)\/", we can parse i ...

  10. 《精通CSS网页布局》读书报告 ----2016-12-5补充

    第一章:CSS布局基础 1.CSS的精髓是布局,而不是样式哦!  (定要好好的研究布局哦,尤其配合html5) 2. html标签的语义性,要好好的看看哦! 3.DTD:文档类型定义. 4.内联--& ...