加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。
Summary
Chi-square test
- Random sample or not / Good or bad
- $$H_0: \text{Good model}$$ $$H_A: \text{Not good model}$$
- Based on the expected proportion to calculate the expected values
- $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
- The degree of freedom is the number of categories minus one
- Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
1-pchisq(chi, df)
- Independent or not
- $$H_0: \text{Independent}$$ $$H_A: \text{not Independent}$$
- Contingency table
- Under $H_0$, in each cell of the table $$\text{expected count}=\frac{\text{row total}\times\text{column total}}{\text{grand total}}$$ That is, $P(A\cap B)=P(A)\cdot P(B)$ under the independent assumption.
- $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
- The degree of freedom is $(\text{row}-1)\times(\text{column}-1)$
- Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
1-pchisq(chi, df)
ADDITIONAL PRACTICE PROBLEMS FOR WEEK 5
The population is all patients at a large system of hospitals; each sampled patient was classified by the type of room he/she was in, and his/her level of satisfaction with the care received. The question is whether type of room is independent of level of satisfaction.

1. What are the null and alternative hypotheses?
2. Under the null, what is the estimated expected number of patients in the "shared room, somewhat satisfied" cell?
3. Degrees of freedom = ( )
4. The chi-square statistic is about 13.8. Roughly what is the P-value, and what is the conclusion of the test?
Solution
1. Null: The two variables are independent; Alternative: The two variables are not independent.
2. We need to expand the original table:

Thus the estimated expected number of patients in the shared room, somewhat satisfied is $$784\times\frac{322}{784}\times\frac{255}{784}=104.7321$$
3. Degree of freedom is $(3-1)\times(3-1)=4$
4. P-value is 0.007961505 which is smaller than 0.05, so we reject $H_0$. That is, the conclusion is the two variables are not independent. R code:
1 - pchisq(13.8, 4)
[1] 0.007961505
UNGRADED EXERCISE SET A PROBLEM 1
According to a genetics model, plants of a particular species occur in the categories A, B, C, and D, in the ratio 9:3:3:1. The categories of different plants are mutually independent. At a lab that grows these plants, 218 are in Category A, 69 in Category B, 84 in Category C, and 29 in Category D. Does the model look good? Follow the steps in Problems 1A-1F.
1A The null hypothesis is:
a. The model is good.
b. The model isn't good.
c. Too many of the plants are in Category C.
d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.
1B The alternative hypothesis is:
a. The model is good.
b. The model isn't good.
c. Too many of the plants are in Category C.
d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.
1C Under the null, the expected number of plants in Category D is( ).
1D The chi-square statistic is closest to
a. 1 b. 1.5 c. 2 d. 2.5 e. 3 f. 3.5 g. 4 h. 4.5
1E Degrees of freedom = ( ).
1F Based on this test, does the model look good? Yes No
Solution
1A) The null hypothesis is "the model is good". (a) is correct.
1B) The alternative hypothesis is "the model is not good". (b) is correct.
1C) The expected number of plants in Category D is $$(218+69+84+29)\times\frac{1}{9+3+3+1}=25$$
1D) (d) is correct. We can use the following table

R code:
o = c(218, 69, 84, 29)
e = c(225, 75, 75, 25)
chi = sum((o - e)^2 / e); chi
[1] 2.417778
1E) Degree of freedom is $4-1=3$.
1F) P-value is 0.4903339 which is larger than 0.05, so we reject $H_A$. The conclusion is "the model is good". R code:
1 - pchisq(chi, 3)
[1] 0.4903339
PROBLEM 2
A simple random sample of cars in a city was categorized according to fuel type and place of manufacture.

Are place of manufacture and fuel type independent? Follow the steps in Problems 2A-2D.
2A If the two variables were independent, the chance that a sampled car is a domestic gasoline fueled car would be estimated to be about
0.0362 0.0499 0.2775 0.3820 0.5
2B If the two variables were independent, the expected number of foreign gas/electric hybrids would be estimated to be ( ). (Please keep at least two decimal places; by now you should understand why you should not round off to an integer.)
2C Degrees of freedom =( )
1 2 3 4
2D The chi-square statistic is 0.6716. The test therefore concludes that the two variables are independent not independent
Solution
2A) Expand the table:

If the two variables were independent, then $$P(\text{domestic gasoline})=P(\text{domestic})\cdot P(\text{gasoline})=\frac{215}{511}\times\frac{337}{511}=0.2774767\doteq 0.2775$$
2B) If the two variables were independent, then $$511\times P(\text{foreign gasoline/electricity})=511\times\frac{296}{511}\times\frac{130}{511}=75.30333$$
2C) Degree of freedom is $(2-1)\times(3-1)=2$.
2D) The P-value is 0.714766 which is larger than 0.05, so we reject $H_A$. That is, the conclusion is independent. R code:
1 - pchisq(0.6716, 2)
[1] 0.714766
We can calculate $\chi^2$ statistic by using R built-in function
chisq.test()
data = matrix(c(146, 18, 51, 191, 26, 79), ncol = 2)
chisq.test(data) Pearson's Chi-squared test data: data
X-squared = 0.6716, df = 2, p-value = 0.7148
加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World的更多相关文章
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL
Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
- 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values
Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...
随机推荐
- 发布了Android的App,我要开源几个组件!
做了一款App,本来是毕业设计但是毕业的时候还没有做完,因为大部分时间都改论文去了,你们都懂的.现在毕业了在工作之余把App基本上做完了.为什么说基本上呢,因为我觉得还有很多功能还没实现,还要很多bu ...
- K-means算法及文本聚类实践
K-Means是常用的聚类算法,与其他聚类算法相比,其时间复杂度低,聚类的效果也还不错,这里简单介绍一下k-means算法,下图是一个手写体数据集聚类的结果. 基本思想 k-means算法需要事先指定 ...
- JavaScript高级程序设计笔记 事件冒泡和事件捕获
1.事件冒泡 要理解事件冒泡,就得先知道事件流.事件流描述的是从页面接收事件的顺序,比如如下的代码: <body> <div> click me! </div> & ...
- 深入理解计算机系统(1.2)---hello world的程序是如何运行的
在写本章的内容之前,LZ先做个小广告.其实也不算是什么广告,就是LZ为了和各位猿友交流方便,另外也确实有个别猿友留言或者在博客里发短消息给LZ要联系方式.因此LZ斗胆建立了一个有关<深入理解计算 ...
- 硬件升级win8.1重新安装系统
上次重装系统一年后升级了硬件配置,双11后,再次折腾系统. 配置硬件的选择 部分配件已经升级过了,之前的一直是AMD平台,发热大功耗高,08年我配的AMD 8450+映泰GX790 128M待机在10 ...
- 生成短链(网址) ShortUrlLink
建表 CREATE TABLE [dbo].[ShortUrl]( [Id] [,) NOT NULL, [LongUrl] [nvarchar]() NOT NULL, [BaseUri] [int ...
- 关于HTML+CSS设置图片居中的方法
有的时候我们会遇到这样一个问题,就是当我们按下F12进行使用firebug调试的时候,我们发现图像没有居中,页面底下有横向的滑动条出现,图片没能够居中,默认状态下只是紧靠在页面最左侧,而我们对图像缩小 ...
- C++命名规范
1.1 类型名 首字母大写,末尾加_T.如: class TnppCoverageArea_T{…}; 1.2 1.2 变量和函数名 变量和函数名中首字母小写,其后每个英文单词的第一个字母 ...
- Project Serve 2013部署方法
在线版Project2013部署手册 服务器环境要求 系统:windows server 2008r2.windows server2012x64 Sharepoint 2013 内存至少16GB,最 ...
- ASP.NET 使用Ajax
转载: http://www.cnblogs.com/dolphinX/p/3242408.html $.ajax向普通页面发送get请求 这是最简单的一种方式了,先简单了解jQuery ajax的语 ...