Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

  • Zeros and Ones: Sum of a sample with replacement
    $S$ is the number of successes: $n$ independent trials, chance of success on a single trial is $p$ $$E(S)=n\cdot p,\ SE(S)=\sqrt{n\cdot p\cdot(1-p)}$$ Binomial formula: $$P(S=k)=C_{n}^{k}\cdot p^{k}\cdot(1-p)^{n-k}$$ where $k=0, 1, 2, \ldots, n$. R code:

    dbinom(x = k, size = n, prob = p)
  • Zeros and Ones: Sum of a sample without replacement
    $S$ is the number of good elements in a simple random sample: $n$ elements drawn from $N=G+B$ elements of which $G$ are good. $$E(S)=n\cdot\frac{G}{N},\ SE(S)=\sqrt{n\cdot\frac{G}{N}\cdot\frac{B}{N}}\cdot\sqrt{\frac{N-n}{N-1}}$$ Hypergeometric formula: $$P(S=g)=\frac{C_{G}^{g}\cdot C_{B}^{n-g}}{C_{N}^{n}}$$ where $g$ is the number of good elements in the sample. R code:
    dhyper(k = n, m = G, n = B, x = g)
  • Zeros and Ones: Sample proportion of ones
    $n$ is the sample size, $X$ is the sample proportion of ones. Binomial setting: $$E(X)=p,\ SE(X)=\sqrt{\frac{p\cdot(1-p)}{n}}$$ Hypergeometric setting: $$E(X)=\frac{G}{N},\ SE(X)=\sqrt{\frac{\frac{G}{N}\cdot\frac{B}{N}}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Sample sum
    Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample sum is $S$, and population size is $N$. With replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma$$ Without replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Sample mean
    Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample mean is $M$, and population size is $N$. With replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}$$ Without replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Square Root Law
    If you multiple the sample size by a factor, the accuracy goes up by the square root of the factor.

PRACTICE

PROBLEM 1

Find the expected value and standard error of

a) your average net gain per bet, if you bet \$1 independently 200 times on “red” at roulette (the bet pays 1 to 1 and the chance of winning is 18/38)

b) the proportion of times you win, if you bet 200 times independently on red as above

c) the total income of a simple random sample of 100 people taken from a population of 5000 people whose average income is \$50,000 with an SD of \$30,000

d) the average income of the sampled people in (c)

e) the number of black cards in a bridge hand (13 cards dealt at random without replacement from a deck consisting of 26 black cards and 26 red cards)

f) the percent of black cards in a bridge hand, described in (e)

Solution

a) Sample mean with replacement. $$E(\text{average net gain})=\mu=1\times\frac{18}{38}+(-1)\times\frac{20}{38}=-\frac{1}{19}\doteq0.05263158$$ $$SE(\text{average net gain})=\frac{SD}{\sqrt{n}}=\frac{\sqrt{E((x-\mu)^2)}}{\sqrt{n}}$$ $$=\frac{\sqrt{(1+\frac{1}{19})\times\frac{18}{38}+(-1+\frac{1}{19})\times\frac{20}{38}}}{\sqrt{200}}\doteq0.07061267$$

b) Sample proportion of ones binomial setting. $$E(\text{proportion of winning times})=p=\frac{18}{38}\doteq0.4736842$$ $$SE(\text{proportion of winning times})=\sqrt{\frac{p\cdot(1-p)}{n}}$$ $$=\sqrt{\frac{\frac{18}{38}\times(1-\frac{18}{38})}{200}}\doteq0.03530634$$

c) Sample sum without replacement. $$E(\text{total income})=n\cdot\mu=100\times50000=5000000$$ $$SE(\text{total income})=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{100}\times30000\times\sqrt{\frac{5000-100}{5000-1}}\doteq 297014.6$$

d) Sample mean without replacement. $$E(\text{average income})=\mu=500000$$ $$SE(\text{average income})=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{30000}{\sqrt{100}}\times\sqrt{\frac{5000-100}{5000-1}}\doteq2970.146$$

e) Sum of a sample without replacement. $$E(\text{black cards in a bridge hand})=n\cdot p=13\times\frac{26}{52}=6.5$$ $$SE(\text{black cards in a bridge hand})=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{13\times\frac{1}{2}\times\frac{1}{2}}\times\sqrt{\frac{52-13}{52-1}}\doteq1.576482$$

f) Sample proportion of ones hypergeometric setting. $$E(\text{proportion of black cards in a bridge hand})=p=\frac{1}{2}$$ $$SE(\text{proportion of black cards in a bridge hand})=\sqrt{\frac{p\cdot(1-p)}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{\frac{\frac{1}{2}\times(1-\frac{1}{2})}{13}}\times\sqrt{\frac{52-13}{52-1}}\doteq0.1212678$$

PROBLEM 2

I play a gambling game repeatedly; the games are independent of each other. In 100 games, my expected average net gain per game is -10 cents, with an SE of 5 cents. In 1000 games, my expected average net gain per game is ________ cents, with an SE of ________ cents.

Solution

The expected value of the net gain will not be changed by increasing the number of playing times. Thus $$E(\text{1000 games})=\mu=-10$$ For $SE$, it will go down when the number of playing games goes up ("square root law"). Thus $$SE(\text{1000 games})=\frac{\sigma}{\sqrt{1000}}=\frac{SE(\text{100 games})\cdot\sqrt{100}}{\sqrt{1000}}\doteq1.581139$$

PROBLEM 3

In a population of tens of thousands of voters, 48% are Democrats. A simple random sample of 125 voters is taken. Approximately what is the chance that a majority of the sampled voters are Democrats?

Solution

Using binomial distribution $n=125, k=63:125, p=0.48$: $$P(\text{majority of 125 sampled voters are Democrats})$$ $$=\sum_{k=63}^{125}C_{125}^{k}\cdot 0.48^k\cdot0.52^{125-k}\doteq0.3269725$$ R code:

sum(dbinom(63:125, 125, 0.48))
[1] 0.3269725

Alternatively, using nomal approximation (sample proportion of ones): $$p=0.48, \sigma=\sqrt{p\cdot(1-p)}$$ $$SE=\frac{\sigma}{\sqrt{125}}, Z=\frac{0.5-p}{SE}$$ Calculating by R:

p = 0.48; sigma = sqrt(p * (1 - p)); se = sigma / sqrt(125)
z = (0.5 - p) / se
1 - pnorm(z)
[1] 0.3272311

The two results are very closer, which is roughly $32.7\%$.

PROBLEM 4

Suppose you are trying to estimate the percent of Democrat voters. Other things being equal, is a simple random sample of 200 voters taken from 100,000 voters about as accurate as a simple random sample of 200 voters taken from 200,000 voters?

Solution

Sample proportion of ones. $$SE(\text{100000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{100000-200}{100000-1}}=0.9990045\cdot\frac{\sigma}{\sqrt{200}}$$ $$SE(\text{200000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{200000-200}{200000-1}}=0.9995024\cdot\frac{\sigma}{\sqrt{200}}$$ Both of the correction factors are very close to 1, thus the accuracy are the same.

UNGRADED EXERCISE SET C

PROBLEM 1

A coin is tossed 2500 times. There is about a 68% chance that the percent of heads is in the range 50% plus or minus? (a percentage)

Solution

$68\%$ is the area between -1 and 1 standard units. So it is $1SE$: $$p=0.5, n=2500$$ $$SE=\sqrt{\frac{p\cdot(1-p)}{n}}=\sqrt{\frac{0.5\times0.5}{2500}}=0.01$$ Thus, there is about $68\%$ chance that the percentage of heads is in the range $50\%$ plus or minus $1\%$.

PROBLEM 2

A simple random sample of 50 students is taken from a class of 300 students. In the class, * the average midterm score is 67 and the $SD$ is 12 * there are 72 women Let $W$ be the number of women in the sample, and let $S$ be the average midterm score of the sampled students.

2A Find $E(W)$.

2B Find $SE(W)$.

2C Find $E(S)$.

2D Find $SE(S)$.

Solution

2A) $$E(W)=50\times\frac{72}{300}=12$$

2B) Sample without replacement. $$N=300, n=50, p=\frac{72}{300}$$ $$SE(W)=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{50\times0.24\times0.76}\times\sqrt{\frac{300-50}{300-1}}\doteq2.761416$$

2C) $$E(S)=\mu=67$$

2D) Sample mean without replacement. $$\sigma=12, n=50, N=300$$ $$SE(S)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{12}{\sqrt{50}}\times\sqrt{\frac{300-50}{300-1}}\doteq1.551782$$

PROBLEM 3

In a city of over 1,000,000 residents, 14% of the residents are senior citizens. In a simple random sample of 1200 residents, there is about a 95% chance that the percent of senior citizens is in the interval [pick the best option; even if you can provide a sharper answer than you see among the choices, please just pick the best among the options] $9\%-19\%$; $10\%-18\%$; $11\%-17\%$; $12\%-16\%$; $13\%-15\%$.

Solution

Firstly, $95\%$ is $2SE$. This is to find sample proportion (using binomial setting since its correction factor is very close to 1): $$E=p=0.14, n=1200$$ $$SE=\frac{p\cdot(1-p)}{\sqrt{n}}=\frac{0.14\times0.86}{\sqrt{1200}}\doteq0.01001665$$ Thus, the interval should be $E\pm2SE=0.14\pm0.02\in[12\%, 16\%]$.

PROBLEM 4

City A has 1,000,000 people; City B has 4,000,000 people. Suppose the goal is to try to predict the percent of Purple Party voters in a sample. Other things being equal, a simple random sample of 1% of the people in City A has about the same accuracy as a simple random sample of ________% of the people in City B. Pick the best option below to fill in the blank.

Solution

For the same accuracy, we need to make the same sample size (not the same proportion!). Thus the percentage of City B should be $$\frac{10^6\times1\%}{4\times10^6}=0.25\%$$

加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples的更多相关文章

  1. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 1 The Two Fundamental Rules (1.5-1.6)

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. 数据字典生成工具之旅(6):NVelocity语法介绍及实例

    本章开始将会为大家讲解NVelocity的用法,并带领大家实现一个简单的代码生成器. NVelocity是一个基于.NET的模板引擎(template engine).它允许任何人仅仅简单的使用模板语 ...

  2. C 语言学习的第 01 课:先来聊聊计算机吧

    各位同学,新学期,我就是你们的助教了.我的个人信息,你们的任课老师都已经介绍过了,所以我这里也就不再啰嗦.下面,来聊聊今天的话题:“先来谈谈计算机吧”. 想必看到这个题目,你们大家一定是不愿意点击进来 ...

  3. 模拟发送http请求

    1.httpie 2.postman:Postman是一款功能强大的网页调试与发送网页HTTP请求的Chrome插件. 3.fiddler

  4. Hibernate用注解实现实体类和表的映射

    数据库mysql: 1.一对一 person50表password50表是一对一的关系: password50表中有外键 person_id person实体类: package com.c50.en ...

  5. Android Intent应用

    1. 显示Intent // 直接设置Content和到下一个的Actvity的名字 Intent i = new Intent(MainActivity.this, AnotherAty.class ...

  6. 【转】向HTML中插入视频并兼容所有浏览器的方法

    原文地址:http://www.jb51.net/web/168548.html 向HTML中插入视频有两种方法,一种是古老的object标签,一种是html5中的video标签,前者兼容性相对好些, ...

  7. mysql 表被锁处理方案

    1. 查询锁表信息 当前运行的所有事务 select * from information_schema.innodb_trx 当前出现的锁 select * from information_sch ...

  8. 一个最简单的ftpsever

    没有什么事情可以做,无聊的很 写个最简单的ftp吧---说白了就是一个简单的文件上传.QAQ 思路:client --读取文件的一行 然后发到server端 然后server 读取 写入文件的一行 先 ...

  9. js-读取上传文件后缀

    /** * 读取文件后缀名称,并转化成小写 * @param file_name * @returns */ function houzuiToLowerCase(file_name) { if (f ...

  10. C#的输入输出,类型,运算符,语句的练习

    //请输入您的姓名.年龄.工作单位, //拼接成一句完整的话 //我叫***,今年**岁了,在****工作. Console.Write("请输入你的姓名:"); string n ...