加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。
ADDITIONAL PRACTICE FOR THE FINAL
PROBLEM 1
A box contains 8 dark chocolates, 8 milk chocolates, and 8 white chocolates. (It’s amazing how this box keeps replenishing itself and reappearing. It’s like the Magic Pudding. Australians will know what I mean, and the rest of you might enjoy finding out. It’s one of the classics of children’s literature.) A simple random sample of 6 chocolates is drawn. Find:
a) the expected number of dark chocolates
b) the SE of the number of dark chocolates
c) the chance that there are fewer than 2 dark chocolates
d) the chance that the second and third chocolates drawn are dark, given that the first and fourth chocolates drawn are not dark
e) the expected number of dark chocolates among the last four draws
Solution
This is hypergeometric distribution (Zeros and Ones: Sum of a sample without replacement), $n=6, N=24, G=8$.
1a) $$E(\text{dark chocolates})=n\cdot\frac{G}{N}=6\times\frac{8}{24}=2$$
1b) $$SE(\text{dark chocolates})=\sqrt{n\cdot\frac{G}{N}\cdot\frac{N-G}{N}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{6\times\frac{8}{24}\times\frac{16}{24}}\times\sqrt{\frac{24-6}{24-1}}\doteq1.021508$$
1c) $$P(\text{fewer than 2 dark chocolates})=\sum_{x=0}^{1}\frac{C_{G}^{x}\cdot C_{N-G}^{n-x}}{C_{N}^{n}}$$ $$=\sum_{x=0}^{1}\frac{C_{8}^{x}\times C_{16}^{6-x}}{C_{24}^{6}}\doteq0.319118$$ R code:
sum(dhyper(0:1, 8, 16, 6))
[1] 0.319118
1d) $$P(\text{2nd and 3rd are dark | 1st and 4th are not dark})$$ $$=\frac{8}{22}\times\frac{7}{21}\doteq0.1212121$$
1e) Given no information about any other draw, the last four draws are probabilistically the same as any other four, say the first four. $$E(\text{dark chocolates among the last four draws})=4\times\frac{8}{24}\doteq1.333333$$
PROBLEM 2
The casino is offering a “house special” at roulette: there are 8 chances in 38 to win, and the bet pays 3 to 1. Suppose you bet $\$1$ on the house special, 200 times, independently. Find:
a) your expected average net gain per bet (and then pledge that you will never play this game)
b) the chance that you come out ahead
c) the chance that you lose more than $\$20$
Solution
2a) Sample mean with replacement: $$E=3\times\frac{8}{38}+(-1)\times\frac{30}{38}\doteq-0.1578947$$
2b) Let $x$ be the number of winning times. $$3x+(-1)\cdot(200-x) > 0\Rightarrow x > 50\Rightarrow x\geq51$$ Binomial distribution $n=200, k=51:200, p=\frac{8}{38}$: $$P(\text{come out ahead})=\sum_{k=51}^{200}C_{200}^{k}\times(\frac{8}{38})^k\times(\frac{30}{38})^{200-k}\doteq0.0750046$$ R code:
sum(dbinom(51:200, 200, 8/38))
[1] 0.0750046
2c) $$3x+(-1)\cdot(200-x) < -20\Rightarrow x < 45\Rightarrow x\leq44$$ $$P(\text{lose more than 20})=\sum_{k=0}^{44}C_{200}^{k}\times(\frac{8}{38})^k\times(\frac{30}{38})^{200-k}\doteq0.6660572$$ R code:
sum(dbinom(0:44, 200, 8/38))
[1] 0.6660572
PROBLEM 3
Households in a large city contain an average of 2.2 people, with an $SD$ of 1.2 people. A simple random sample of 625 households is taken.
a) Approximately what is the chance that there are more than 1400 people in the sampled households?
b) How would your answer to a) have been different had the sample been drawn with replacement?
Solution
3a) Sample sum without replacement but the correction factor is very close to 1 since the city is very large. $\mu=2.2, \sigma=1.2, n=625$: $$SE=\sqrt{n}\cdot\sigma=\sqrt{625}\times1.2=30$$ $$Z=\frac{1400.5-n\cdot\mu}{SE}$$ Calculating by R:
n = 625; mu = 2.2
z = (1400.5 - n * mu) / 30
1 - pnorm(z)
[1] 0.1976625
Thus the chance is around $19.77\%$.
3b) It wouldn't. Because the city is large so the correction factor is very close to 1, that is, the chance will be the same whether draw with replacement or without replacement.
PROBLEM 4
There are three boxes. Box I contains one gold coin and one silver coin. Box II contains two silver coins. Box III contains two gold coins. A box is selected at random, and then one coin is selected at random from that box. Given that the coin is gold, what is the chance that the other coin in the box is gold? [No, the answer is not 1/2.]
Solution
Bayes Rules: $$P(\text{box 3 | the first coin is gold})=\frac{\text{the first coin is gold and it is from box 3}}{\text{the first coin is gold}}$$ $$=\frac{\frac{1}{3}\times1}{\frac{1}{3}\times\frac{1}{2}+\frac{1}{3}\times0+\frac{1}{3}\times1}=\frac{2}{3}$$
PROBLEM 5
A coin is tossed $n$ times. There is about $95\%$ chance that the proportion of heads is in the range $.49$ to $.51$. The number of tosses $n$ is closest to:
a) 1,000
b) 5,000
c) 10,000
d) 50,000
Solution
Sample proportion of ones. $p=0.5$ and the interval $.49$ to $.51$ has to be $0.5\pm2SE$, thus $$2SE=0.01\Rightarrow SE=0.005$$ On the other hand $$SE=\sqrt{\frac{p\cdot(1-p)}{n}}=\sqrt{\frac{\frac{1}{4}}{n}}=0.005\Rightarrow n=10000$$
FINAL EXAM
PROBLEM 1
Suppose you are trying to estimate the percent of women in a city. Other things being equal, a simple random sample of 0.1% of the population of a city that has 2,000,000 people is ________ as a simple random sample of 0.1% of the population of a city that has 500,000 people. Fill in the blank with the best of the following choices.
a) about 1/4 times as accurate
b) about 1/2 times as accurate
c) about as accurate
d) about 2 times as accurate
e) about 4 times as accurate
Solution
Square Root Law. $$2\times10^6\times0.1\%=2000,\ 5\times10^5\times0.1\%=500$$ $$\Rightarrow\sqrt{\frac{2000}{500}}=2$$ Thus the former is about 2 times as accurate as the latter. d) is correct.
PROBLEM 2
A group of 30 people consists of 15 children, 10 men, and 5 women. Tom and Jerry are two of the men in the group. Five people are picked at random without replacement.
2A Find the chance the first person picked is a man, given that the fourth and fifth people picked are children.
2B Find the chance that more than two women are picked.
2C Find the chance that Tom and Jerry both get picked.
Solution
2A) $$P(\text{1st person is a man | 4th and 5th are children})=\frac{10}{28}\doteq0.3571429$$
2B) Hypergeometric distribution $$P(\text{more than 2 women})=\sum_{x=3}^{5}\frac{C_{5}^{x}\cdot C_{25}^{5-x}}{C_{30}^{5}}\doteq0.02193592$$ R code:
sum(dhyper(3:5, 5, 25, 5))
[1] 0.02193592
2C) Both of Tom and Jerry get picked means we only have to select 3 persons among other 28 remaining people: $$P(\text{both of Tom and Jerry get selected})=\frac{C_{28}^{3}}{C_{30}^{5}}\doteq0.02298851$$ R code:
choose(28, 3) / choose(30, 5)
[1] 0.02298851
PROBLEM 3
A gambling game pays 4 to 1 and the chance of winning is 1 in 6. Suppose you bet $\$1$ on this game 600 times independently.
3A Find the expected number of times you win.
3B Find the $SE$ of the number of times you win.
3C Find the chance that you lose more than $\$50$ (that is, your net gain in the 600 bets is less than $-\$50$).
Solution
Zeros and Ones: Sum of a sample with replacement, $n=600, p=\frac{1}{6}$.
3A) $$E(\text{winning times})=n\cdot p=600\times\frac{1}{6}=100$$
3B) $$SE(\text{winning times})=\sqrt{n\cdot p\cdot(1-p)}\doteq9.128709$$
3C) Let $x$ be the number of winning times, $$4x+(-1)\cdot(600-x) < -50\Rightarrow x < 110\Rightarrow x\leq109$$ Binomial distribution $n=600, k=0:109, p=\frac{1}{6}$: $$P(\text{lose more than 50})=\sum_{k=0}^{109}C_{600}^{k}\times(\frac{1}{6})^k\times(\frac{5}{6})^{600-k}\doteq0.8508149$$ R code:
sum(dbinom(0:109, 600, 1/6))
[1] 0.8508149
PROBLEM 4
In a grocery store, butter is sold in “sticks” that are shaped like little bricks. The weights of these sticks are like draws at random with replacement from a population with average 4 ounces and SD 0.2 ounces. The grocery store receives the butter in boxes; each box consists of 100 sticks.
4A Find the chance that the average weight of the sticks in one box is less than 3.999 ounces.
4B The grocery store has received 6 boxes of butter. There is about ___________ chance that in at least one of the boxes, the average weight of sticks is less than 3.999 ounces.
Solution
4A) Sample mean with replacement, $$\mu=4, \sigma=0.2, n=100\Rightarrow SE=\frac{\sigma}{\sqrt{n}}=0.02$$ $$Z=\frac{3.999-\mu}{SE}$$ Calculating by R:
z = (3.999 - 4) / 0.02
pnorm(z)
[1] 0.4800612
4B) Following 4A), this is binomial distribution $n=6, k=1:6, p=0.4800612$: $$P(\text{at least 1 box is less than 3.999 ounces})$$ $$=\sum_{k=1}^{6}C_{6}^{k}\cdot p^k\cdot(1-p)^{6-k}=0.9802433$$ R code:
p = pnorm(z)
sum(dbinom(1:6, 6, p))
[1] 0.9802433
PROBLEM 5
In surveys about sensitive topics, respondents are sometimes given ways to “hide” their answers from the surveyor. In a survey of taxpayers, one of the questions is, “Did you cheat on your taxes?” To answer, the respondent is asked to toss a fair coin. If it lands heads, the respondent must answer “yes.” If it lands tails, the respondent must answer the question truthfully, either “yes” or “no” (the answer has to be the one that is true). Assume that all respondents follow this procedure, and that for 10% of the respondents the truthful answer is “yes.” Also assume that the result of a respondent’s coin toss is independent of whether or not the respondent cheated on his / her taxes. Oneof the respondents is picked at random.
5A Given that the respondent cheated on his / her taxes, what is the chance that he / she answered “yes”?
5B Given that the respondent did not cheat on his / her taxes, what is the chance that he / she answered “yes”?
5C Given that the respondent answered “yes,” what is the chance that the respondent cheated on his / her taxes?
Solution
According to the information, we have $$P(\text{did not cheat on taxes})=0.1,\ P(\text{not cheated on taxes})=0.9$$
5A) $$P(\text{answered Yes | cheated on taxes})$$ $$=\frac{P(\text{cheated on taxes and answered Yes)}}{P(\text{cheated on taxes})}$$ $$=\frac{P(\text{cheated and tossed head})+P(\text{cheated and tossed tail})}{P(\text{cheated on taxes})}$$ $$=\frac{0.1\times0.5+0.1\times0.5}{0.1}=1$$ This result indicates that if someone cheated on taxes then he / she must answered "Yes"!
5B) $$P(\text{answered Yes | did not cheat on taxes})$$ $$=\frac{P(\text{answered Yes but did not cheat on taxes})}{P(\text{did not cheat on taxes})}$$ $$=\frac{0.9\times0.5}{0.9}=0.5$$
5C) $$P(\text{cheated on taxes | answered Yes})=\frac{P(\text{cheated on taxes and answered Yes})}{P(\text{answered Yes})}$$ $$=\frac{P(\text{cheated on taxes and answered Yes})}{P(\text{cheated on taxes and answered Yes})+P(\text{did not cheat on taxes and answered Yes})}$$ $$=\frac{0.1\times0.5+0.1\times0.5}{(0.1\times0.5+0.1\times0.5)+0.9\times0.5}\doteq0.1818182$$
PROBLEM 6
In a population of 10,000 adults, $20\%$ are smokers. A simple random sample of 600 of the adults is drawn.
6A Find the expected number of smokers in the sample.
6B The $SE$ of the number of smokers in the sample is closest to
6C Find the chance that there are fewer than 115 smokers in the sample.
Solution
6A) $$E=n\cdot p=600\times0.2=120$$
6B) $$SE=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{600\times0.2\times0.8}\times\sqrt{\frac{10000-600}{10000-1}}\doteq9.499949$$
6C) $$Z=\frac{115-120}{SE}$$ Calculating by R:
n = 600; N = 10000; p = 0.2
se = sqrt(n * p * (1 - p)) * sqrt((N - n) / (N - 1))
z = (115 - n * p) / se
pnorm(z)
[1] 0.2993334
PROBLEM 7
When a die is rolled, the face with six spots appears with chance $\frac{1}{6}$, independently of all other rolls. Rank the three events below in increasing order of probability. For example, if you choose “A B C”, you are saying that A has the smallest chance, B has more chance than A but less chance than C, and C has the biggest chance. [If you think that some of the events have the same chance, please think again.]
A: The face with six spots shows up on fewer than $16.7\%$ of the rolls when a die is rolled 60,000 times.
B: The face with six spots shows up on more than $16.7\%$ of the rolls when a die is rolled 30,000 times.
C: The face with six spots shows up on fewer than $16.7\%$ of the rolls when a die is rolled 30,000 times.
Solution
This is binomial distribution. Let $m=n\cdot p$, where $n$ is the number of rolls and $p=\frac{1}{6}$: $$P(A)=\sum_{0}^{m-1}C_{n}^{k}\cdot p^k\cdot(1-p)^{n-k}$$ where $n=60000$. $$P(B)=\sum_{m+1}^{n}C_{n}^{k}\cdot p^k\cdot(1-p)^{n-k}$$ where $n=30000$. $$P(C)=\sum_{0}^{m-1}C_{n}^{k}\cdot p^k\cdot(1-p)^{n-k}$$ where $n=30000$. R code:
dieroll = function(n, p, id){ # id=0 means fewer than a fixed proportion
m = n * p
if(id == 0){
print(sum(dbinom(0:(m - 1), n, p)))
} else{
print(sum(dbinom((m + 1):n, n, p)))
}
}
> dieroll(60000, 1/6, 0)
[1] 0.4983005
> dieroll(30000, 1/6, 1)
[1] 0.4962232
> dieroll(30000, 1/6, 0)
[1] 0.4975965
Thus $$P(B) < P(C) < P(A)$$
PROBLEM 8
A die has 2 red faces, 2 blue faces, and 2 green faces. It is rolled 240 times. Let $R$ be the number of times red faces appear, and $B$ the number of times blue faces appear.
8A The random variable $R$ is the sum of 240 draws at random with replacement from
8B Consider the random variable $D = R - B$. That’s $D$ for “difference.” If all 240 rolls show blue faces, then $D = -240$; if they all show red faces, then $D = 240$; otherwise $D$ is somewhere in between. The random variable $D$ is the sum of 240 draws at random with replacement from
8C Find $E(D)$
8D Find $SE(D)$
Solution
8A) Note that $R$ is from 0 to 240, that is, if red was picked then $R=R+1$. Thus the similar pool should include 1 and 0, such as $$1,1,0,0,0,0$$ or $$1,0,0$$
8B) Similar to 8A. The equivalent pool should contain 1(red), -1(blue), and 0(green), such as $$1, 0, -1$$ or $$1, 1, -1, -1, 0, 0$$
8C) & 8D) Sample sum with replacement: $$\mu=0, n=240$$ and $$\sigma=\sqrt{(1-1)^2\times\frac{1}{3}+(-1-0)^2\times\frac{1}{3}+(0-0)^2\times\frac{1}{3}}=\sqrt{\frac{2}{3}}$$ Thus $$E(D)=n\cdot\mu=0$$ $$SE(D)=\sqrt{n}\cdot\sigma=\sqrt{240}\times\sqrt{\frac{2}{3}}=\sqrt{160}\doteq12.64911$$
加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final的更多相关文章
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 1 The Two Fundamental Rules (1.5-1.6)
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
随机推荐
- PAT 1067. Sort with Swap(0,*)
1067. Sort with Swap(0,*) (25) Given any permutation of the numbers {0, 1, 2,..., N-1}, it is easy ...
- spread语法解析与使用
@[spread, javavscript, es6, react] Spread语法是ES6中的一个新特性,在需要使用多参数(函数参数).多元素(数组迭代)或者多变量(解构赋值)的地方使用sprea ...
- 高端大气上档次Ergotron Neo-Flex+MBP Retina的组合~
- HBase初探
string hbaseCluster = "https://charju.azurehdinsight.net"; string hadoopUsername = "账 ...
- 2016 5.03开始记录我的it学习。
好多谢立成师兄给我这个网址,我发现博客园不仅仅可以随笔记载很多东西,还是一个资源丰富的网站,接下来的四年我会用心去记录这些学习的点滴.
- 【JQuery】jQuery.inArray 确定第一个参数在数组中的位置
函数:jQuery.inArray(value,array,[fromIndex]) 解释: value:用于在数组中查找是否存在 array:待处理数组. ...
- 9-cat 简明笔记
连接或显示文件 cat [options] [file-list] 参数 file-list 是cat要处理的单个文件路径名或多个文件路径名列表,如果不指定任何参数或指定一个连字符(-)代替文件名,c ...
- Chrome开发工具Elements面板(编辑DOM和CSS样式)详解
Element 译为“元素”,Element 面板可以让我们动态查看和编辑DOM节点和CSS样式表,并且立即生效,避免了频繁切换浏览器和编辑器的麻烦. 我们可以使用Element面板来查看源代码,它不 ...
- Notes on 'Selective Search For Object Recognition'
UijlingsIJCV2013, Selective Search For Object Recognition code 算法思想 利用分割算法将图片细分成很多region, 或超像素. 在这个基 ...
- 100735D
排序+搜索 为什么这是对的呢?其实我不是很清楚 大概是这个样子的:我们希望构成三角形的三个数尽可能集中,因此在搜索中贪心地选取从最小依次往上,选取三条边,但是总感觉有反例,先挖个坑... #inclu ...