加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。
Summary
Chi-square test
- Random sample or not / Good or bad
- $$H_0: \text{Good model}$$ $$H_A: \text{Not good model}$$
- Based on the expected proportion to calculate the expected values
- $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
- The degree of freedom is the number of categories minus one
- Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
1-pchisq(chi, df)
- Independent or not
- $$H_0: \text{Independent}$$ $$H_A: \text{not Independent}$$
- Contingency table
- Under $H_0$, in each cell of the table $$\text{expected count}=\frac{\text{row total}\times\text{column total}}{\text{grand total}}$$ That is, $P(A\cap B)=P(A)\cdot P(B)$ under the independent assumption.
- $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
- The degree of freedom is $(\text{row}-1)\times(\text{column}-1)$
- Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
1-pchisq(chi, df)
ADDITIONAL PRACTICE PROBLEMS FOR WEEK 5
The population is all patients at a large system of hospitals; each sampled patient was classified by the type of room he/she was in, and his/her level of satisfaction with the care received. The question is whether type of room is independent of level of satisfaction.

1. What are the null and alternative hypotheses?
2. Under the null, what is the estimated expected number of patients in the "shared room, somewhat satisfied" cell?
3. Degrees of freedom = ( )
4. The chi-square statistic is about 13.8. Roughly what is the P-value, and what is the conclusion of the test?
Solution
1. Null: The two variables are independent; Alternative: The two variables are not independent.
2. We need to expand the original table:

Thus the estimated expected number of patients in the shared room, somewhat satisfied is $$784\times\frac{322}{784}\times\frac{255}{784}=104.7321$$
3. Degree of freedom is $(3-1)\times(3-1)=4$
4. P-value is 0.007961505 which is smaller than 0.05, so we reject $H_0$. That is, the conclusion is the two variables are not independent. R code:
1 - pchisq(13.8, 4)
[1] 0.007961505
UNGRADED EXERCISE SET A PROBLEM 1
According to a genetics model, plants of a particular species occur in the categories A, B, C, and D, in the ratio 9:3:3:1. The categories of different plants are mutually independent. At a lab that grows these plants, 218 are in Category A, 69 in Category B, 84 in Category C, and 29 in Category D. Does the model look good? Follow the steps in Problems 1A-1F.
1A The null hypothesis is:
a. The model is good.
b. The model isn't good.
c. Too many of the plants are in Category C.
d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.
1B The alternative hypothesis is:
a. The model is good.
b. The model isn't good.
c. Too many of the plants are in Category C.
d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.
1C Under the null, the expected number of plants in Category D is( ).
1D The chi-square statistic is closest to
a. 1 b. 1.5 c. 2 d. 2.5 e. 3 f. 3.5 g. 4 h. 4.5
1E Degrees of freedom = ( ).
1F Based on this test, does the model look good? Yes No
Solution
1A) The null hypothesis is "the model is good". (a) is correct.
1B) The alternative hypothesis is "the model is not good". (b) is correct.
1C) The expected number of plants in Category D is $$(218+69+84+29)\times\frac{1}{9+3+3+1}=25$$
1D) (d) is correct. We can use the following table

R code:
o = c(218, 69, 84, 29)
e = c(225, 75, 75, 25)
chi = sum((o - e)^2 / e); chi
[1] 2.417778
1E) Degree of freedom is $4-1=3$.
1F) P-value is 0.4903339 which is larger than 0.05, so we reject $H_A$. The conclusion is "the model is good". R code:
1 - pchisq(chi, 3)
[1] 0.4903339
PROBLEM 2
A simple random sample of cars in a city was categorized according to fuel type and place of manufacture.

Are place of manufacture and fuel type independent? Follow the steps in Problems 2A-2D.
2A If the two variables were independent, the chance that a sampled car is a domestic gasoline fueled car would be estimated to be about
0.0362 0.0499 0.2775 0.3820 0.5
2B If the two variables were independent, the expected number of foreign gas/electric hybrids would be estimated to be ( ). (Please keep at least two decimal places; by now you should understand why you should not round off to an integer.)
2C Degrees of freedom =( )
1 2 3 4
2D The chi-square statistic is 0.6716. The test therefore concludes that the two variables are independent not independent
Solution
2A) Expand the table:

If the two variables were independent, then $$P(\text{domestic gasoline})=P(\text{domestic})\cdot P(\text{gasoline})=\frac{215}{511}\times\frac{337}{511}=0.2774767\doteq 0.2775$$
2B) If the two variables were independent, then $$511\times P(\text{foreign gasoline/electricity})=511\times\frac{296}{511}\times\frac{130}{511}=75.30333$$
2C) Degree of freedom is $(2-1)\times(3-1)=2$.
2D) The P-value is 0.714766 which is larger than 0.05, so we reject $H_A$. That is, the conclusion is independent. R code:
1 - pchisq(0.6716, 2)
[1] 0.714766
We can calculate $\chi^2$ statistic by using R built-in function
chisq.test()
data = matrix(c(146, 18, 51, 191, 26, 79), ncol = 2)
chisq.test(data) Pearson's Chi-squared test data: data
X-squared = 0.6716, df = 2, p-value = 0.7148
加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World的更多相关文章
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
随机推荐
- Python2.5-原理之模块
此部分来自于<Python学习手册>第五部分 一.模块(21章) 模块是最高级别的程序组织单元,它将程序代码和数据封装起来以便重用..模块往往对应于python程序文件.每个文件就是一个模 ...
- Scala入门之Array
/** * 大数据技术是数据的集合以及对数据集合的操作技术的统称,具体来说: * 1,数据集合:会涉及数据的搜集.存储等,搜集会有很多技术,存储现在比较经典的是使用Hadoop,也有很多情况使用Kaf ...
- 【有人@我】Android中高亮变色显示文本中的关键字
应该是好久没有写有关技术类的文章了,前天还有人在群里问我,说群主很长时间没有分享干货了,今天分享一篇Android中TextView在大段的文字内容中如何让关键字高亮变色的文章 ,希望对大家有所帮助, ...
- 提高Visual Studio开发性能的几款插件
通过打开Visual Studio,单机TOOLS—Extensions and Updates-Online-Visual Studio Gallery(工具-扩展和更新-联网-Visual Stu ...
- Recommending branded products from social media -RecSys 2013-20160422
1.Information publication:RecSys 2013 author:zhengyong zhang 2.What 是对上一篇论文的拓展:利用社交媒体中用户信息 对用户购买的类别排 ...
- pdo知识总结
PDO 用了这么久了这里抽时间总结下: pdo (php data object) 是php5 新出来的支持 mysql 操作的一个功能.用其可代替mysqli扩展.因为是php自带的.所以我觉得效率 ...
- android 资讯阅读器
最近找申请到了一个不错的接口 , 非常适合拿来写一个资讯类的app. 现在着手写,随写随更.也算是抛砖引玉.烂尾请勿喷.╭(╯^╰)╮ android 资讯阅读器 第一阶段目标样式(滑动切换标签 , ...
- Java--剑指offer(8)
36.输入两个链表,找出它们的第一个公共结点. 解题思路:这里主要是把两个链表的节点都放入两个栈中,这样就可以按照出栈的方式来比较节点,因为单链表只要是有相同的节点,那么之后的节点也都是一样的,所以如 ...
- git组成结构
1. blob对象(blob) 2. 目录树(tree) 3. 提交(commit) 4. 标签(tag) git 文件按照状态分为3类: 1. 已追踪的(tracked) 2. 被忽略的(Ignor ...
- 【UOJ #29】【IOI 2014】holiday
http://uoj.ac/problem/29 cdq四次处理出一直向左, 一直向右, 向左后回到起点, 向右后回到起点的dp数组,最后统计答案. 举例:\(fi\)表示一直向右走i天能参观的最多景 ...