Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

  • Zeros and Ones: Sum of a sample with replacement
    $S$ is the number of successes: $n$ independent trials, chance of success on a single trial is $p$ $$E(S)=n\cdot p,\ SE(S)=\sqrt{n\cdot p\cdot(1-p)}$$ Binomial formula: $$P(S=k)=C_{n}^{k}\cdot p^{k}\cdot(1-p)^{n-k}$$ where $k=0, 1, 2, \ldots, n$. R code:

    dbinom(x = k, size = n, prob = p)
  • Zeros and Ones: Sum of a sample without replacement
    $S$ is the number of good elements in a simple random sample: $n$ elements drawn from $N=G+B$ elements of which $G$ are good. $$E(S)=n\cdot\frac{G}{N},\ SE(S)=\sqrt{n\cdot\frac{G}{N}\cdot\frac{B}{N}}\cdot\sqrt{\frac{N-n}{N-1}}$$ Hypergeometric formula: $$P(S=g)=\frac{C_{G}^{g}\cdot C_{B}^{n-g}}{C_{N}^{n}}$$ where $g$ is the number of good elements in the sample. R code:
    dhyper(k = n, m = G, n = B, x = g)
  • Zeros and Ones: Sample proportion of ones
    $n$ is the sample size, $X$ is the sample proportion of ones. Binomial setting: $$E(X)=p,\ SE(X)=\sqrt{\frac{p\cdot(1-p)}{n}}$$ Hypergeometric setting: $$E(X)=\frac{G}{N},\ SE(X)=\sqrt{\frac{\frac{G}{N}\cdot\frac{B}{N}}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Sample sum
    Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample sum is $S$, and population size is $N$. With replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma$$ Without replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Sample mean
    Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample mean is $M$, and population size is $N$. With replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}$$ Without replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Square Root Law
    If you multiple the sample size by a factor, the accuracy goes up by the square root of the factor.

PRACTICE

PROBLEM 1

Find the expected value and standard error of

a) your average net gain per bet, if you bet \$1 independently 200 times on “red” at roulette (the bet pays 1 to 1 and the chance of winning is 18/38)

b) the proportion of times you win, if you bet 200 times independently on red as above

c) the total income of a simple random sample of 100 people taken from a population of 5000 people whose average income is \$50,000 with an SD of \$30,000

d) the average income of the sampled people in (c)

e) the number of black cards in a bridge hand (13 cards dealt at random without replacement from a deck consisting of 26 black cards and 26 red cards)

f) the percent of black cards in a bridge hand, described in (e)

Solution

a) Sample mean with replacement. $$E(\text{average net gain})=\mu=1\times\frac{18}{38}+(-1)\times\frac{20}{38}=-\frac{1}{19}\doteq0.05263158$$ $$SE(\text{average net gain})=\frac{SD}{\sqrt{n}}=\frac{\sqrt{E((x-\mu)^2)}}{\sqrt{n}}$$ $$=\frac{\sqrt{(1+\frac{1}{19})\times\frac{18}{38}+(-1+\frac{1}{19})\times\frac{20}{38}}}{\sqrt{200}}\doteq0.07061267$$

b) Sample proportion of ones binomial setting. $$E(\text{proportion of winning times})=p=\frac{18}{38}\doteq0.4736842$$ $$SE(\text{proportion of winning times})=\sqrt{\frac{p\cdot(1-p)}{n}}$$ $$=\sqrt{\frac{\frac{18}{38}\times(1-\frac{18}{38})}{200}}\doteq0.03530634$$

c) Sample sum without replacement. $$E(\text{total income})=n\cdot\mu=100\times50000=5000000$$ $$SE(\text{total income})=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{100}\times30000\times\sqrt{\frac{5000-100}{5000-1}}\doteq 297014.6$$

d) Sample mean without replacement. $$E(\text{average income})=\mu=500000$$ $$SE(\text{average income})=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{30000}{\sqrt{100}}\times\sqrt{\frac{5000-100}{5000-1}}\doteq2970.146$$

e) Sum of a sample without replacement. $$E(\text{black cards in a bridge hand})=n\cdot p=13\times\frac{26}{52}=6.5$$ $$SE(\text{black cards in a bridge hand})=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{13\times\frac{1}{2}\times\frac{1}{2}}\times\sqrt{\frac{52-13}{52-1}}\doteq1.576482$$

f) Sample proportion of ones hypergeometric setting. $$E(\text{proportion of black cards in a bridge hand})=p=\frac{1}{2}$$ $$SE(\text{proportion of black cards in a bridge hand})=\sqrt{\frac{p\cdot(1-p)}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{\frac{\frac{1}{2}\times(1-\frac{1}{2})}{13}}\times\sqrt{\frac{52-13}{52-1}}\doteq0.1212678$$

PROBLEM 2

I play a gambling game repeatedly; the games are independent of each other. In 100 games, my expected average net gain per game is -10 cents, with an SE of 5 cents. In 1000 games, my expected average net gain per game is ________ cents, with an SE of ________ cents.

Solution

The expected value of the net gain will not be changed by increasing the number of playing times. Thus $$E(\text{1000 games})=\mu=-10$$ For $SE$, it will go down when the number of playing games goes up ("square root law"). Thus $$SE(\text{1000 games})=\frac{\sigma}{\sqrt{1000}}=\frac{SE(\text{100 games})\cdot\sqrt{100}}{\sqrt{1000}}\doteq1.581139$$

PROBLEM 3

In a population of tens of thousands of voters, 48% are Democrats. A simple random sample of 125 voters is taken. Approximately what is the chance that a majority of the sampled voters are Democrats?

Solution

Using binomial distribution $n=125, k=63:125, p=0.48$: $$P(\text{majority of 125 sampled voters are Democrats})$$ $$=\sum_{k=63}^{125}C_{125}^{k}\cdot 0.48^k\cdot0.52^{125-k}\doteq0.3269725$$ R code:

sum(dbinom(63:125, 125, 0.48))
[1] 0.3269725

Alternatively, using nomal approximation (sample proportion of ones): $$p=0.48, \sigma=\sqrt{p\cdot(1-p)}$$ $$SE=\frac{\sigma}{\sqrt{125}}, Z=\frac{0.5-p}{SE}$$ Calculating by R:

p = 0.48; sigma = sqrt(p * (1 - p)); se = sigma / sqrt(125)
z = (0.5 - p) / se
1 - pnorm(z)
[1] 0.3272311

The two results are very closer, which is roughly $32.7\%$.

PROBLEM 4

Suppose you are trying to estimate the percent of Democrat voters. Other things being equal, is a simple random sample of 200 voters taken from 100,000 voters about as accurate as a simple random sample of 200 voters taken from 200,000 voters?

Solution

Sample proportion of ones. $$SE(\text{100000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{100000-200}{100000-1}}=0.9990045\cdot\frac{\sigma}{\sqrt{200}}$$ $$SE(\text{200000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{200000-200}{200000-1}}=0.9995024\cdot\frac{\sigma}{\sqrt{200}}$$ Both of the correction factors are very close to 1, thus the accuracy are the same.

UNGRADED EXERCISE SET C

PROBLEM 1

A coin is tossed 2500 times. There is about a 68% chance that the percent of heads is in the range 50% plus or minus? (a percentage)

Solution

$68\%$ is the area between -1 and 1 standard units. So it is $1SE$: $$p=0.5, n=2500$$ $$SE=\sqrt{\frac{p\cdot(1-p)}{n}}=\sqrt{\frac{0.5\times0.5}{2500}}=0.01$$ Thus, there is about $68\%$ chance that the percentage of heads is in the range $50\%$ plus or minus $1\%$.

PROBLEM 2

A simple random sample of 50 students is taken from a class of 300 students. In the class, * the average midterm score is 67 and the $SD$ is 12 * there are 72 women Let $W$ be the number of women in the sample, and let $S$ be the average midterm score of the sampled students.

2A Find $E(W)$.

2B Find $SE(W)$.

2C Find $E(S)$.

2D Find $SE(S)$.

Solution

2A) $$E(W)=50\times\frac{72}{300}=12$$

2B) Sample without replacement. $$N=300, n=50, p=\frac{72}{300}$$ $$SE(W)=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{50\times0.24\times0.76}\times\sqrt{\frac{300-50}{300-1}}\doteq2.761416$$

2C) $$E(S)=\mu=67$$

2D) Sample mean without replacement. $$\sigma=12, n=50, N=300$$ $$SE(S)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{12}{\sqrt{50}}\times\sqrt{\frac{300-50}{300-1}}\doteq1.551782$$

PROBLEM 3

In a city of over 1,000,000 residents, 14% of the residents are senior citizens. In a simple random sample of 1200 residents, there is about a 95% chance that the percent of senior citizens is in the interval [pick the best option; even if you can provide a sharper answer than you see among the choices, please just pick the best among the options] $9\%-19\%$; $10\%-18\%$; $11\%-17\%$; $12\%-16\%$; $13\%-15\%$.

Solution

Firstly, $95\%$ is $2SE$. This is to find sample proportion (using binomial setting since its correction factor is very close to 1): $$E=p=0.14, n=1200$$ $$SE=\frac{p\cdot(1-p)}{\sqrt{n}}=\frac{0.14\times0.86}{\sqrt{1200}}\doteq0.01001665$$ Thus, the interval should be $E\pm2SE=0.14\pm0.02\in[12\%, 16\%]$.

PROBLEM 4

City A has 1,000,000 people; City B has 4,000,000 people. Suppose the goal is to try to predict the percent of Purple Party voters in a sample. Other things being equal, a simple random sample of 1% of the people in City A has about the same accuracy as a simple random sample of ________% of the people in City B. Pick the best option below to fill in the blank.

Solution

For the same accuracy, we need to make the same sample size (not the same proportion!). Thus the percentage of City B should be $$\frac{10^6\times1\%}{4\times10^6}=0.25\%$$

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples的更多相关文章

  1. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 1 The Two Fundamental Rules (1.5-1.6)

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. ios蓝牙开发(五)BabyBluetooth蓝牙库介绍

    BabyBluetooth 是一个最简单易用的蓝牙库,基于CoreBluetooth的封装,并兼容ios和mac osx. 特色: 基于原生CoreBluetooth框架封装的轻量级的开源库,可以帮你 ...

  2. #CSDN刷票门# 有没有人在恶意刷票?CSDN请告诉我!用24小时监控数据说话!

    特别声明: 此次并非针对其他参与2013中国十大优秀开源项目的同行,体系有漏洞要谴责的是制定规则并从中获益但不作为的权贵,草根们制定不了规则但可发现和利用漏洞,这是程序员应有反叛精神没错.但被作为道具 ...

  3. Hashtable Dictionary List 谁效率更高

    一 前言 很少接触HashTable晚上回来简单看了看,然后做一些增加和移除的操作,就想和List 与 Dictionary比较下存数据与取数据的差距,然后便有了如下的一此测试, 当然我测的方法可能不 ...

  4. c:forEach 标签中varStatus的用法

    c:forEach varStatus属性 current 当前这次迭代的(集合中的)项index  当前这次迭代从 0 开始的迭代索引count  当前这次迭代从 1 开始的迭代计数first 用来 ...

  5. redis性能测试tcp socket and unix domain

    UNIX Domain Socket IPC socket API原本是为网络通讯设计的,但后来在socket的框架上发展出一种IPC机制,就是UNIX Domain Socket.虽然网络socke ...

  6. Java程序-进程中的"进程"

    进程 我们知道程序在磁盘上的时候是静态的,当他被加载到内存的时候,就变成了一个动态的,称为进程,如下图是程序被加载到内存后,在内存中的分布情况如下      此图来自http://blog.csdn. ...

  7. Hibernate Tools 自动生成hibernate的hbm文件

    本文有待商榷 当我们在新增插件的时候发现会出现duplicate location,意思是所选的anchive所包含的zip路径已经复用,现象如下: 如上图所示黄色标记部分“Duplicate loc ...

  8. [转]如何判断js中的数据类型

    原文地址:http://blog.sina.com.cn/s/blog_51048da70101grz6.html 如何判断js中的数据类型:typeof.instanceof. constructo ...

  9. [转]servlet中的service, doGet, doPost方法的区别和联系

    原文地址:http://m.blog.csdn.net/blog/ghyg525/22928567 大家都知道在javax.servlet.Servlet接口中只有init, service, des ...

  10. MyBatis_Generator的使用(实践)

    使用MyBatis的Generator主要配置三个地方 1.pom.xml 2.generatorConfig.xml 3.maven命令生成文件 一.pom.xml 1.引入依赖 <mysql ...