加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。
Summary
- One-sample $t$ test
- Test for a population mean (unknown SD); sample size $n$. That is, known sample mean and SD but unknown population mean (needs to be tested) and SD.
- Same $H_0$ and $H_A$, same calculation as $z$ test, except:
- Assume population distribution roughly normal, unknown mean and SD.
- Approximate unknown SD of population by sample SD, with $n-1$ degree of freedom (i.e. denominator).
- Two independent samples
- Comparing the means
- Known parameters: $\mu$, $n$ and $\sigma$ (SD) of the independent samples.
- $H_0: \mu_1=\mu_2\Rightarrow$ the difference between the two sample means is expected to be $\mu_1-\mu_2$.
- The SE of the difference is $$SE_1=\frac{\sigma_1}{\sqrt{n_1}},\ SE_2=\frac{\sigma_2}{\sqrt{n_2}}$$ $$\Rightarrow SE=\sqrt{SE_1^2+SE_2^2}$$ \item The above result derives from $$SE(X-Y)=\sqrt{Var(X-Y)}$$ $$=\sqrt{1^2\cdot Var(X)+(-1)^2\cdot Var(Y)}=\sqrt{SE^2(X)+SE^2(Y)}$$
- $z$ statistics is $$z=\frac{\mu_1-\mu_2}{SE}$$
- Comparing the percents
- Known parameters: $p$ and $n$ of the independent samples
- $H_0: p_1=p_2$
- It should use the same percents estimation for both samples (i.e. pooled estimate), since the P-valuse is computed under the null $H_0: p_1=p_2$.
- Pooled estimate of $p$: $$\hat{p}=\frac{n_1\cdot p_1+n_2\cdot p_2}{n_1+n_2}$$
- SE of the sample proportion, for the two samples: $$SE_1=\sqrt{\frac{\hat{p}\cdot(1-\hat{p})}{n_1}}, SE_2=\sqrt{\frac{\hat{p}\cdot(1-\hat{p})}{n_2}}$$
- Similar to the sample mean, the SE of the difference between the sample percents is approximately $$SE=\sqrt{SE_1^2+SE_2^2}$$
- $z$ statistic is $$z=\frac{p_1-p_2}{SE}$$
- Comparing the means
EXERCISE SET 3
If a problem asks for an approximation, please use the methods described in the video lecture segments. Unless the problem says otherwise, please give answers correct to one decimal place according to those methods. Some of the problems below are about simple random samples. If the population size is not given, you can assume that the correction factor for standard errors is close enough to 1 that it does not need to be computed. Please use the 5% cutoff for P-values unless otherwise instructed in the problem.
PROBLEM 1
A statistical test is performed, and its P-value turns out to be about 3%. Which of the following must be true? Pick ALL that are correct.
a. The null hypothesis is true.
b. There is about a 3% chance that the null hypothesis is true.
c. The alternative hypothesis is true.
d. There is about a 97% chance that the alternative hypothesis is true.
e. If the null hypothesis were true, there would be about a 3% chance of getting data that were like those that were observed in the sample or even further in the direction of the alternative.
f. The P-value of about 3% was computed assuming that the null hypothesis was true.
Solution
(e) and (f) are correct. Firstly, 3% is computed under assuming $H_0$ is true. So (f) is correct. Then, P-value means: assuming the null is true, the chance of getting data like the data in the sample or even more like the alternative. So (e) is correct (definition of P-value). Note that (a) and (c) could be wrong because there are two types of error, Type I and Type II error. There is no such thing as "the chance that the null / alternative is true". So (b) and (d) are incorrect.
PROBLEM 2
The distribution of cholesterol levels of the residents of a state closely follows the normal curve. Investigators want to test whether the mean cholesterol level of the residents is 200 mg/dL or lower. A simple random sample of 12 residents has a mean cholesterol level of 185 mg/dL, with an SD of 20 mg/dL (computed as the ordinary SD of a list of 12 numbers, with 12 in the denominator). Let $m$ be the mean cholesterol level of the residents of the state, measured in mg/dL. In Problems 2A-2E, perform a $t$ test of the hypotheses $$\text{Null}: m = 200$$ $$\text{Alternative}: m < 200$$
2A You are using data from a simple random sample to test the given hypotheses. Which of the following sets of assumptions is further required to justify the use of a $t$ distribution to compute the P-value?
a. The distribution of the cholesterol levels of the residents of the state is close to normal.
b. The distribution of the cholesterol levels of the residents of the state is close to normal, with an unknown mean.
c. The distribution of the cholesterol levels of the residents of the state is close to normal, with an unknown mean and an unknown SD.
2B The t distribution that should be used has ( ) degrees of freedom.
2C The value of the t statistic is closest to?
2D The P-value of the test is closest to?
2E The conclusion of the test is to reject the null hypothesis not reject the null hypothesis
Solution
2A) The sample is small, so unless the population distribution is bell-shaped, you might have trouble using a bell-shaped approximation to the probabilities for the sample mean. If the mean of the normally distributed population were already known, there would be no reason to perform the test. If the SD of the underlying normally distributed population (with unknown mean) were known, you would perform the $z$ test.
2B) The degree of freedom is sample size minus 1, that is, $12-1=11$.
2C) Sample SD is $$\sigma=20\times\sqrt{\frac{12}{11}}$$ and $$SE=\frac{\sigma}{12}$$ Thus the t-statistic is $$t=\frac{185-200}{SE}=-2.487469$$ R code:
sigma = 20 * sqrt(12 / 11)
se = sigma / sqrt(12)
t = (185 - 200) /se; t
[1] -2.487469
2D) Left-tailed $t$ test, 11 degree of freedom, P-value is $p=0.0150854$ R code:
pt(t, 11)
[1] 0.0150854
2E) Because the P-value is less than 5%, so reject $H_0$.
PROBLEM 3
In a simple random sample of 1000 people taken from City A, 13% are senior citizens. In an independent simple random sample of 600 people taken from City B, 17% are senior citizens. Is the percent of senior citizens different in the two cities? Or is this just chance variation? Answer in the steps described in Problems 3A-3C.
3A Under the null hypothesis, the percent of senior citizens in each of the two cities is estimated to be ( )%.
3B Under the null hypothesis, the estimated standard error of the difference between the percents of senior citizens in the two samples is closest to ( )%.
3C The P-value of the test is closest to ( )%, so the null hypothesis is rejected.
Solution
This is two independent simple random samples, $$H_0: p_A=p_B$$ $$p_A\neq p_B$$
3A) Pooled estimate of $p$ is $$\hat{p}=\frac{1000\times13%+600\times17%}{1000+600}=14.5%$$
3B) $${SE}_{A}=\sqrt{\frac{\hat{p}\cdot(1-\hat{p})}{1000}}$$ $${SE}_{B}=\sqrt{\frac{\hat{p}\cdot(1-\hat{p})}{600}}$$ Thus, SE of the difference between the sample percents is approximately $$SE=\sqrt{({SE}_{A}^2+{SE}_{B}^2)}=1.818241%$$ R code:
p = 0.145; n1 = 1000; n2 = 600
se.a = sqrt(p * (1 - p) / n1)
se.b = sqrt(p * (1 - p) / n2)
se = sqrt(se.a^2 + se.b^2); se
[1] 0.01818241
3C) The observed difference is 4%, so $$z=\frac{0.04-0}{SE}$$ Two-tailed $z$ test and the P-value is 2.781197% which is less than 5%, thus we reject $H_0$. R code:
z = (0.04 - 0) / se
(1 - pnorm(z)) * 2
[1] 0.02781197
PROBLEM 4
Last year, there were 30,000 students at a university; their GPA had a mean of 2.9 and an SD of 0.6. This year, in a simple random sample of 100 students taken from this university, the GPAs have a mean of 2.95 and an SD of 0.55. Has the mean GPA at the university gone up since last year, or is this just chance variation? Pick the correct calculation of the test statistic. one-sample $z = (2.95 - 2.9)/0.06 = 0.833$; P large; conclude chance variation one-sample $z = (2.95 - 2.9)/0.055 = 0.909$; P large; conclude chance variation two-sample $z = (0.05 - 0)/\sqrt{0.055^2 + 0.0035^2} = 0.907$; P large; conclude chance variation
Solution
Firstly, the population is the students at the university of THIS YEAR. $$H_0: \mu=2.9$$ $$H_A: \mu > 2.9$$ The third choice must be wrong since the number of last year\rq s students was not a sample.Thus, this is one-sample $z$ test. $$n=100, \sigma=0.55$$ $$\Rightarrow SE=\frac{\sigma}{\sqrt{n}}=0.055, z=\frac{2.95-0}{SE}=0.909$$ R code:
n = 100; sigma = 0.55
se = sigma / sqrt(n)
z = (2.95 - 2.9) /se; z
[1] 0.9090909
加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests的更多相关文章
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
随机推荐
- 前端见微知著工具篇:Grunt实现自动化
转载说明 本篇文章为转载文章,来源为[前端福利]用grunt搭建自动化的web前端开发环境-完整教程,之所以转载,是因为本文写的太详细了,我很想自己来写,但是发现跳不出这篇文章的圈子,因为写的详尽,所 ...
- 发布了Android的App,我要开源几个组件!
做了一款App,本来是毕业设计但是毕业的时候还没有做完,因为大部分时间都改论文去了,你们都懂的.现在毕业了在工作之余把App基本上做完了.为什么说基本上呢,因为我觉得还有很多功能还没实现,还要很多bu ...
- 实验二 Java面向对象程序设计
实验二 Java面向对象程序设计 实验内容 1. 初步掌握单元测试和TDD 2. 理解并掌握面向对象三要素:封装.继承.多态 3. 初步掌握UML建模 4. 熟悉S.O.L.I.D原则 5. 了解设计 ...
- mongodb .net core 调用
MongoClient _client; IMongoDatabase _db; MongoCredential credential = MongoCredential.CreateMongoCRC ...
- [PGM] I-map和D-separation
之前在概率图模型对概率图模型做了简要的介绍.此处介绍有向图模型中几个常常提到的概念,之前参考的多为英文资料,本文参考的是<概率图模型-原理与技术的>中译版本.很新的书,纸质很好,翻译没有很 ...
- 分享:关于之前锤子手机刷MIUI之后,现在有事跌宕起伏的刷回了Smartisan OS!
序言: 距离上次把锤子手机刷成MIUI之后已经一个半月了,我是一个刷机党,一个半月足够让我适应一个系统,了解一个系统.刷机有风险,不过我愿意冒这个风险,因为兴趣,没别的.刷机之后,肯定是有问题的,没 ...
- SignalR与ActiveMQ结合构建实时通信
一.概述 本教程主要阐释了如何利用SignalR与消息队列的结合,实现不同客户端的交互 SignalR如何和消息队列交互(暂使用ActiveMQ消息队列) SignalR寄宿在web中和其他Signa ...
- nios II--实验5——定时器硬件部分
定时器 硬件开发 新建原理图 打开Quartus II 11.0,新建一个工程,File -> New Project Wizard…,忽略Introduction,之间单击 Next> ...
- 一个奇妙的java坑:Long 类型的比较
Long userId=127L; Long authorId=127L; System.out.println(userId==authorId);//true userId=128L; autho ...
- if..elif语句
根据用户输入内容打印其权限 # alex --> 超级管理员 # eric --> 普通管理员 # tony,rain --> 业务主管 # 其他 --> 普通用户 name ...