Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

  • One-sample $t$ test

    • Test for a population mean (unknown SD); sample size $n$. That is, known sample mean and SD but unknown population mean (needs to be tested) and SD.
    • Same $H_0$ and $H_A$, same calculation as $z$ test, except:
      • Assume population distribution roughly normal, unknown mean and SD.
      • Approximate unknown SD of population by sample SD, with $n-1$ degree of freedom (i.e. denominator).
  • Two independent samples
    1. Comparing the means

      • Known parameters: $\mu$, $n$ and $\sigma$ (SD) of the independent samples.
      • $H_0: \mu_1=\mu_2\Rightarrow$ the difference between the two sample means is expected to be $\mu_1-\mu_2$.
      • The SE of the difference is $$SE_1=\frac{\sigma_1}{\sqrt{n_1}},\ SE_2=\frac{\sigma_2}{\sqrt{n_2}}$$ $$\Rightarrow SE=\sqrt{SE_1^2+SE_2^2}$$ \item The above result derives from $$SE(X-Y)=\sqrt{Var(X-Y)}$$ $$=\sqrt{1^2\cdot Var(X)+(-1)^2\cdot Var(Y)}=\sqrt{SE^2(X)+SE^2(Y)}$$
      • $z$ statistics is $$z=\frac{\mu_1-\mu_2}{SE}$$
    2. Comparing the percents
      • Known parameters: $p$ and $n$ of the independent samples
      • $H_0: p_1=p_2$
      • It should use the same percents estimation for both samples (i.e. pooled estimate), since the P-valuse is computed under the null $H_0: p_1=p_2$.
      • Pooled estimate of $p$: $$\hat{p}=\frac{n_1\cdot p_1+n_2\cdot p_2}{n_1+n_2}$$
      • SE of the sample proportion, for the two samples: $$SE_1=\sqrt{\frac{\hat{p}\cdot(1-\hat{p})}{n_1}}, SE_2=\sqrt{\frac{\hat{p}\cdot(1-\hat{p})}{n_2}}$$
      • Similar to the sample mean, the SE of the difference between the sample percents is approximately $$SE=\sqrt{SE_1^2+SE_2^2}$$
      • $z$ statistic is $$z=\frac{p_1-p_2}{SE}$$

EXERCISE SET 3

If a problem asks for an approximation, please use the methods described in the video lecture segments. Unless the problem says otherwise, please give answers correct to one decimal place according to those methods. Some of the problems below are about simple random samples. If the population size is not given, you can assume that the correction factor for standard errors is close enough to 1 that it does not need to be computed. Please use the 5% cutoff for P-values unless otherwise instructed in the problem.

PROBLEM 1

A statistical test is performed, and its P-value turns out to be about 3%. Which of the following must be true? Pick ALL that are correct.

a. The null hypothesis is true.

b. There is about a 3% chance that the null hypothesis is true.

c. The alternative hypothesis is true.

d. There is about a 97% chance that the alternative hypothesis is true.

e. If the null hypothesis were true, there would be about a 3% chance of getting data that were like those that were observed in the sample or even further in the direction of the alternative.

f. The P-value of about 3% was computed assuming that the null hypothesis was true.

Solution

(e) and (f) are correct. Firstly, 3% is computed under assuming $H_0$ is true. So (f) is correct. Then, P-value means: assuming the null is true, the chance of getting data like the data in the sample or even more like the alternative. So (e) is correct (definition of P-value). Note that (a) and (c) could be wrong because there are two types of error, Type I and Type II error. There is no such thing as "the chance that the null / alternative is true". So (b) and (d) are incorrect.

PROBLEM 2

The distribution of cholesterol levels of the residents of a state closely follows the normal curve. Investigators want to test whether the mean cholesterol level of the residents is 200 mg/dL or lower. A simple random sample of 12 residents has a mean cholesterol level of 185 mg/dL, with an SD of 20 mg/dL (computed as the ordinary SD of a list of 12 numbers, with 12 in the denominator). Let $m$ be the mean cholesterol level of the residents of the state, measured in mg/dL. In Problems 2A-2E, perform a $t$ test of the hypotheses $$\text{Null}: m = 200$$ $$\text{Alternative}: m < 200$$

2A You are using data from a simple random sample to test the given hypotheses. Which of the following sets of assumptions is further required to justify the use of a $t$ distribution to compute the P-value?

a. The distribution of the cholesterol levels of the residents of the state is close to normal.

b. The distribution of the cholesterol levels of the residents of the state is close to normal, with an unknown mean.

c. The distribution of the cholesterol levels of the residents of the state is close to normal, with an unknown mean and an unknown SD.

2B The t distribution that should be used has ( ) degrees of freedom.

2C The value of the t statistic is closest to?

2D The P-value of the test is closest to?

2E The conclusion of the test is to reject the null hypothesis not reject the null hypothesis

Solution

2A) The sample is small, so unless the population distribution is bell-shaped, you might have trouble using a bell-shaped approximation to the probabilities for the sample mean. If the mean of the normally distributed population were already known, there would be no reason to perform the test. If the SD of the underlying normally distributed population (with unknown mean) were known, you would perform the $z$ test.

2B) The degree of freedom is sample size minus 1, that is, $12-1=11$.

2C) Sample SD is $$\sigma=20\times\sqrt{\frac{12}{11}}$$ and $$SE=\frac{\sigma}{12}$$ Thus the t-statistic is $$t=\frac{185-200}{SE}=-2.487469$$ R code:

sigma = 20 * sqrt(12 / 11)
se = sigma / sqrt(12)
t = (185 - 200) /se; t
[1] -2.487469

2D) Left-tailed $t$ test, 11 degree of freedom, P-value is $p=0.0150854$ R code:

pt(t, 11)
[1] 0.0150854

2E) Because the P-value is less than 5%, so reject $H_0$.

PROBLEM 3

In a simple random sample of 1000 people taken from City A, 13% are senior citizens. In an independent simple random sample of 600 people taken from City B, 17% are senior citizens. Is the percent of senior citizens different in the two cities? Or is this just chance variation? Answer in the steps described in Problems 3A-3C.

3A Under the null hypothesis, the percent of senior citizens in each of the two cities is estimated to be ( )%.

3B Under the null hypothesis, the estimated standard error of the difference between the percents of senior citizens in the two samples is closest to ( )%.

3C The P-value of the test is closest to ( )%, so the null hypothesis is rejected.

Solution

This is two independent simple random samples, $$H_0: p_A=p_B$$ $$p_A\neq p_B$$

3A) Pooled estimate of $p$ is $$\hat{p}=\frac{1000\times13%+600\times17%}{1000+600}=14.5%$$

3B) $${SE}_{A}=\sqrt{\frac{\hat{p}\cdot(1-\hat{p})}{1000}}$$ $${SE}_{B}=\sqrt{\frac{\hat{p}\cdot(1-\hat{p})}{600}}$$ Thus, SE of the difference between the sample percents is approximately $$SE=\sqrt{({SE}_{A}^2+{SE}_{B}^2)}=1.818241%$$ R code:

p = 0.145; n1 = 1000; n2 = 600
se.a = sqrt(p * (1 - p) / n1)
se.b = sqrt(p * (1 - p) / n2)
se = sqrt(se.a^2 + se.b^2); se
[1] 0.01818241

3C) The observed difference is 4%, so $$z=\frac{0.04-0}{SE}$$ Two-tailed $z$ test and the P-value is 2.781197% which is less than 5%, thus we reject $H_0$. R code:

z = (0.04 - 0) / se
(1 - pnorm(z)) * 2
[1] 0.02781197

PROBLEM 4

Last year, there were 30,000 students at a university; their GPA had a mean of 2.9 and an SD of 0.6. This year, in a simple random sample of 100 students taken from this university, the GPAs have a mean of 2.95 and an SD of 0.55. Has the mean GPA at the university gone up since last year, or is this just chance variation? Pick the correct calculation of the test statistic. one-sample $z = (2.95 - 2.9)/0.06 = 0.833$; P large; conclude chance variation one-sample $z = (2.95 - 2.9)/0.055 = 0.909$; P large; conclude chance variation two-sample $z = (0.05 - 0)/\sqrt{0.055^2 + 0.0035^2} = 0.907$; P large; conclude chance variation

Solution

Firstly, the population is the students at the university of THIS YEAR. $$H_0: \mu=2.9$$ $$H_A: \mu > 2.9$$ The third choice must be wrong since the number of last year\rq s students was not a sample.Thus, this is one-sample $z$ test. $$n=100, \sigma=0.55$$ $$\Rightarrow SE=\frac{\sigma}{\sqrt{n}}=0.055, z=\frac{2.95-0}{SE}=0.909$$ R code:

n = 100; sigma = 0.55
se = sigma / sqrt(n)
z = (2.95 - 2.9) /se; z
[1] 0.9090909

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests的更多相关文章

  1. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. Windbg调优Kafka.Client内存泄露

    从来没写过Blog,想想也是,工作十多年了,搞过N多的架构.技术,不与大家分享实在是可惜了.另外,从传统地ERP行业转到互联网,也遇到了很所前所未有的问题,原来知道有一些坑,但是不知道坑太多太深.借着 ...

  2. VS Code First使用Mysql数据库详解

    最近电脑出毛病了,自己装显卡驱动给装死了开不了机,自己研究了两天也没解决,只有去修电脑的找专业人员,说起来惭愧,虽然自己是搞计算机的可电脑自己重装系统都还搞不定.重装系统又清理灰尘花了50大洋,现在用 ...

  3. Google最新截屏案例详解

    Google从Android 5.0 开始,给出了截屏案例ScreenCapture,在同版本的examples的Media类别中可以找到.给需要开发手机或平板截屏应用的小伙伴提供了非常有意义的参考资 ...

  4. Codeforces Round #358(div 2)

    A:统计个数题,要注意ans+=a*b+c*d中,如果a*b>int,那么即使ans是long long也会越界,所以ans+=(long long)a*b+(long long)c*d B:模 ...

  5. mybatis缓存学习笔记

    mybatis有两级缓存机制,一级缓存默认开启,可以在手动关闭:二级缓存默认关闭,可以手动开启.一级缓存为线程内缓存,二级缓存为线程间缓存. 一提缓存,必是查询.缓存的作用就是查询快.写操作只能使得缓 ...

  6. 77 swapon-激活Linux系统中交换空间

    Linux swapon命令用于激活Linux系统中交换空间,Linux系统的内存管理必须使用交换区来建立虚拟内存. 语法 /sbin/swapon -a [-v] /sbin/swapon [-v] ...

  7. mvc的自带json序列化的datetime在js中的解析

    默认仅序列化后的日期格式是这样的:'/Date(124565787989)/'(数字随便敲的,数字表示相对于1970年的总毫秒数) 在js中借助eval函数,eval函数的意义:将参数中的字符串当作j ...

  8. session超时设置

    session的设置方法有三种: 1.直接在应用服务器中设置 如果是Tomcat,可以在Tomcat目录下conf/web.xml中找到<session-config>元素,tomcat默 ...

  9. [转]Spring的事务管理难点剖析(1):DAO和事务管理的牵绊

    原文地址:http://stamen.iteye.com/blog/1441758 有些人很少使用Spring而不使用Spring事务管理器的应用,因此常常有人会问:是否用了Spring,就一定要用S ...

  10. [转]hibernate在eclipse的逆向工程生成hbm.xml和bean类

    原文地址:http://www.xuebuyuan.com/210489.html 以前一直用myelipse,在myeclipse做hibernate逆向工程倒是很顺手了. 可是最近改用eclips ...