加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World

Stat2.3x Inference（统计推断）课程由加州大学伯克利分校（University of California, Berkeley）于2014年在edX平台讲授。

Summary

Chi-square test

Random sample or not / Good or bad
- $$H_0: \text{Good model}$$ $$H_A: \text{Not good model}$$
- Based on the expected proportion to calculate the expected values
- $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
- The degree of freedom is the number of categories minus one
- Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
```
1-pchisq(chi, df)
```
Independent or not
- $$H_0: \text{Independent}$$ $$H_A: \text{not Independent}$$
- Contingency table
- Under $H_0$, in each cell of the table $$\text{expected count}=\frac{\text{row total}\times\text{column total}}{\text{grand total}}$$ That is, $P(A\cap B)=P(A)\cdot P(B)$ under the independent assumption.
- $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
- The degree of freedom is $(\text{row}-1)\times(\text{column}-1)$
- Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
```
1-pchisq(chi, df)
```

ADDITIONAL PRACTICE PROBLEMS FOR WEEK 5

The population is all patients at a large system of hospitals; each sampled patient was classified by the type of room he/she was in, and his/her level of satisfaction with the care received. The question is whether type of room is independent of level of satisfaction.

1. What are the null and alternative hypotheses?

2. Under the null, what is the estimated expected number of patients in the "shared room, somewhat satisfied" cell?

3. Degrees of freedom = ( )

4. The chi-square statistic is about 13.8. Roughly what is the P-value, and what is the conclusion of the test?

Solution

1. Null: The two variables are independent; Alternative: The two variables are not independent.

2. We need to expand the original table:

Thus the estimated expected number of patients in the shared room, somewhat satisfied is $$784\times\frac{322}{784}\times\frac{255}{784}=104.7321$$

3. Degree of freedom is $(3-1)\times(3-1)=4$

4. P-value is 0.007961505 which is smaller than 0.05, so we reject $H_0$. That is, the conclusion is the two variables are not independent. R code:

1 - pchisq(13.8, 4)

[1] 0.007961505

UNGRADED EXERCISE SET A PROBLEM 1

According to a genetics model, plants of a particular species occur in the categories A, B, C, and D, in the ratio 9:3:3:1. The categories of different plants are mutually independent. At a lab that grows these plants, 218 are in Category A, 69 in Category B, 84 in Category C, and 29 in Category D. Does the model look good? Follow the steps in Problems 1A-1F.

1A The null hypothesis is:

a. The model is good.

b. The model isn't good.

c. Too many of the plants are in Category C.

d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.

1B The alternative hypothesis is:

a. The model is good.

b. The model isn't good.

c. Too many of the plants are in Category C.

d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.

1C Under the null, the expected number of plants in Category D is( ).

1D The chi-square statistic is closest to

a. 1 b. 1.5 c. 2 d. 2.5 e. 3 f. 3.5 g. 4 h. 4.5

1E Degrees of freedom = ( ).

1F Based on this test, does the model look good? Yes No

Solution

1A) The null hypothesis is "the model is good". (a) is correct.

1B) The alternative hypothesis is "the model is not good". (b) is correct.

1C) The expected number of plants in Category D is $$(218+69+84+29)\times\frac{1}{9+3+3+1}=25$$

1D) (d) is correct. We can use the following table

R code:

o = c(218, 69, 84, 29)

e = c(225, 75, 75, 25)

chi = sum((o - e)^2 / e); chi

[1] 2.417778

1E) Degree of freedom is $4-1=3$.

1F) P-value is 0.4903339 which is larger than 0.05, so we reject $H_A$. The conclusion is "the model is good". R code:

1 - pchisq(chi, 3)

[1] 0.4903339

PROBLEM 2

A simple random sample of cars in a city was categorized according to fuel type and place of manufacture.

Are place of manufacture and fuel type independent? Follow the steps in Problems 2A-2D.

2A If the two variables were independent, the chance that a sampled car is a domestic gasoline fueled car would be estimated to be about

0.0362 0.0499 0.2775 0.3820 0.5

2B If the two variables were independent, the expected number of foreign gas/electric hybrids would be estimated to be ( ). (Please keep at least two decimal places; by now you should understand why you should not round off to an integer.)

2C Degrees of freedom =( )

1 2 3 4

2D The chi-square statistic is 0.6716. The test therefore concludes that the two variables are independent not independent

Solution

2A) Expand the table:

If the two variables were independent, then $$P(\text{domestic gasoline})=P(\text{domestic})\cdot P(\text{gasoline})=\frac{215}{511}\times\frac{337}{511}=0.2774767\doteq 0.2775$$

2B) If the two variables were independent, then $$511\times P(\text{foreign gasoline/electricity})=511\times\frac{296}{511}\times\frac{130}{511}=75.30333$$

2C) Degree of freedom is $(2-1)\times(3-1)=2$.

2D) The P-value is 0.714766 which is larger than 0.05, so we reject $H_A$. That is, the conclusion is independent. R code:

1 - pchisq(0.6716, 2)

[1] 0.714766

We can calculate $\chi^2$ statistic by using R built-in function

chisq.test()

data = matrix(c(146, 18, 51, 191, 26, 79), ncol = 2)

  chisq.test(data)

 Pearson's Chi-squared test

data:  data

X-squared = 0.6716, df = 2, p-value = 0.7148

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World的更多相关文章

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

在使用EF Code First开发时，遇到的“关系”问题，以及解决方法
Entity Framework Code First 简称 EF CF也行,就是在开发的时候,以代码先行的原则,开发人员无需考虑数据库端的一些问题(开发过程中基本不需要在数据库管理器上操作) 言归 ...
ASP.NT运行原理和页面生命周期详解及其应用
ASP.NT运行原理和页面生命周期详解及其应用 1. 下面是我画的一张关于asp.net运行原理和页面生命周期的一张详解图.如果你对具体不太了解,请参照博客园其他帖子.在这里我主要讲解它的实际应用. ...
[NOIP摸你赛]Hzwer的陨石（带权并查集）
题目描述: 经过不懈的努力,Hzwer召唤了很多陨石.已知Hzwer的地图上共有n个区域,且一开始的时候第i个陨石掉在了第i个区域.有电力喷射背包的ndsf很自豪,他认为搬陨石很容易,所以他将一些区域 ...
linux安装phpmyadmin
1 配置好MySQL 后启动mysql (service mysqld start); 2 下载phpmyadmin 包,解压只phpmyadmin (解压命令:zip -r abc.zip abc ...
springMvc的第一个demo
1.下载jar包 http://repo.spring.io/libs-release-local/org/springframework/spring/4.2.3.RELEASE/ 2.下载源码 j ...
[转]JAVA设计模式之单例模式
原文地址:http://blog.csdn.net/jason0539/article/details/23297037 概念: java中单例模式是一种常见的设计模式,单例模式的写法有好几种,这里主 ...
git 格式化输出版本信息
git log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)&l ...
Maven-生命周期
Maven的生命周期是为了对所有的构建过程进行了抽象了,便于统一. clean(清理) 此生命周期旨在给工程做清理工作,它主要包含以下阶段: pre-clean - 执行项目清理前所需要的工作. cl ...
JACASCRIPT--的奇技技巧的44招
JavaScript是一个绝冠全球的编程语言,可用于Web开发.移动应用开发(PhoneGap.Appcelerator).服务器端开发(Node.js和Wakanda)等等.JavaScript还是 ...
java内存空间详解
Java内存分配与管理是Java的核心技术之一,之前我们曾介绍过Java的内存管理与内存泄露以及Java垃圾回收方面的知识,今天我们再次深入Java核心,详细介绍一下Java在内存分配方面的知识.一般 ...

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World的更多相关文章

随机推荐

热门专题