Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

Chi-square test

  • Random sample or not / Good or bad

    • $$H_0: \text{Good model}$$ $$H_A: \text{Not good model}$$
    • Based on the expected proportion to calculate the expected values
    • $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
    • The degree of freedom is the number of categories minus one
    • Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
      1-pchisq(chi, df)
  • Independent or not
    • $$H_0: \text{Independent}$$ $$H_A: \text{not Independent}$$
    • Contingency table
    • Under $H_0$, in each cell of the table $$\text{expected count}=\frac{\text{row total}\times\text{column total}}{\text{grand total}}$$ That is, $P(A\cap B)=P(A)\cdot P(B)$ under the independent assumption.
    • $\chi^2$ statistic is $$\chi^2=\sum{\frac{(o-e)^2}{e}}$$ where $o$ is observed values, $e$ is expected values.
    • The degree of freedom is $(\text{row}-1)\times(\text{column}-1)$
    • Follows approximately the $\chi^2$ distribution, we can calculate its P-value by using R function:
      1-pchisq(chi, df)

ADDITIONAL PRACTICE PROBLEMS FOR WEEK 5

The population is all patients at a large system of hospitals; each sampled patient was classified by the type of room he/she was in, and his/her level of satisfaction with the care received. The question is whether type of room is independent of level of satisfaction.

1. What are the null and alternative hypotheses?

2. Under the null, what is the estimated expected number of patients in the "shared room, somewhat satisfied" cell?

3. Degrees of freedom = ( )

4. The chi-square statistic is about 13.8. Roughly what is the P-value, and what is the conclusion of the test?

Solution

1. Null: The two variables are independent; Alternative: The two variables are not independent.

2. We need to expand the original table:

Thus the estimated expected number of patients in the shared room, somewhat satisfied is $$784\times\frac{322}{784}\times\frac{255}{784}=104.7321$$

3. Degree of freedom is $(3-1)\times(3-1)=4$

4. P-value is 0.007961505 which is smaller than 0.05, so we reject $H_0$. That is, the conclusion is the two variables are not independent. R code:

1 - pchisq(13.8, 4)
[1] 0.007961505

UNGRADED EXERCISE SET A PROBLEM 1

According to a genetics model, plants of a particular species occur in the categories A, B, C, and D, in the ratio 9:3:3:1. The categories of different plants are mutually independent. At a lab that grows these plants, 218 are in Category A, 69 in Category B, 84 in Category C, and 29 in Category D. Does the model look good? Follow the steps in Problems 1A-1F.

1A The null hypothesis is:

a. The model is good.

b. The model isn't good.

c. Too many of the plants are in Category C.

d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.

1B The alternative hypothesis is:

a. The model is good.

b. The model isn't good.

c. Too many of the plants are in Category C.

d. The proportion of plants in Category A is expected to be 9/16; the difference in the sample is due to chance.

1C Under the null, the expected number of plants in Category D is( ).

1D The chi-square statistic is closest to

a. 1 b. 1.5 c. 2 d. 2.5 e. 3 f. 3.5 g. 4 h. 4.5

1E Degrees of freedom = ( ).

1F Based on this test, does the model look good? Yes No

Solution

1A) The null hypothesis is "the model is good". (a) is correct.

1B) The alternative hypothesis is "the model is not good". (b) is correct.

1C) The expected number of plants in Category D is $$(218+69+84+29)\times\frac{1}{9+3+3+1}=25$$

1D) (d) is correct. We can use the following table

R code:

o = c(218, 69, 84, 29)
e = c(225, 75, 75, 25)
chi = sum((o - e)^2 / e); chi
[1] 2.417778

1E) Degree of freedom is $4-1=3$.

1F) P-value is 0.4903339 which is larger than 0.05, so we reject $H_A$. The conclusion is "the model is good". R code:

1 - pchisq(chi, 3)
[1] 0.4903339

PROBLEM 2

A simple random sample of cars in a city was categorized according to fuel type and place of manufacture.

Are place of manufacture and fuel type independent? Follow the steps in Problems 2A-2D.

2A If the two variables were independent, the chance that a sampled car is a domestic gasoline fueled car would be estimated to be about

0.0362 0.0499 0.2775 0.3820 0.5

2B If the two variables were independent, the expected number of foreign gas/electric hybrids would be estimated to be ( ). (Please keep at least two decimal places; by now you should understand why you should not round off to an integer.)

2C Degrees of freedom =( )

1 2 3 4

2D The chi-square statistic is 0.6716. The test therefore concludes that the two variables are independent not independent

Solution

2A) Expand the table:

If the two variables were independent, then $$P(\text{domestic gasoline})=P(\text{domestic})\cdot P(\text{gasoline})=\frac{215}{511}\times\frac{337}{511}=0.2774767\doteq 0.2775$$

2B) If the two variables were independent, then $$511\times P(\text{foreign gasoline/electricity})=511\times\frac{296}{511}\times\frac{130}{511}=75.30333$$

2C) Degree of freedom is $(2-1)\times(3-1)=2$.

2D) The P-value is 0.714766 which is larger than 0.05, so we reject $H_A$. That is, the conclusion is independent. R code:

1 - pchisq(0.6716, 2)
[1] 0.714766

We can calculate $\chi^2$ statistic by using R built-in function

chisq.test()
data = matrix(c(146, 18, 51, 191, 26, 79), ncol = 2)
chisq.test(data) Pearson's Chi-squared test data: data
X-squared = 0.6716, df = 2, p-value = 0.7148

加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 5 Window to a Wider World的更多相关文章

  1. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 4 Dependent Samples

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  2. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 3 One-sample and two-sample tests

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  3. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  4. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 1 Estimating unknown parameters

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  5. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: FINAL

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 4 The Central Limit Theorem

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 3 The law of averages, and expected values

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

随机推荐

  1. 博客搬家。新博客地址 http://fangjian0423.github.io/

    以后新的博客会发到 http://fangjian0423.github.io/ 里. 这里基本上不会再更新博客了.

  2. [Python]新手写爬虫全过程(已完成)

    今天早上起来,第一件事情就是理一理今天该做的事情,瞬间get到任务,写一个只用python字符串内建函数的爬虫,定义为v1.0,开发中的版本号定义为v0.x.数据存放?这个是一个练手的玩具,就写在tx ...

  3. ASP.NET 系列:单元测试之Log4Net

    使用Log组件时,我们通常自定义ILogger接口,使用Log4Net等组件进行适配来定义不同的实现类.使用Log4Net日志组件时,为了即方便单元测试又能使用配置文件,我们通过Log4Net的ILo ...

  4. jQuery问题集锦

    [1]阻止提交表单 方法1: $(function () { $("input[type=submit]").click(function (event) { //如果不满足表单提 ...

  5. 东大OJ-快速排序

    1236: Simple Sort 时间限制: 1 Sec  内存限制: 128 MB 提交: 195  解决: 53 [提交][状态][讨论版] 题目描述      You are given n ...

  6. centos中crontab(计时器)用法详解

    关于crontab: crontab命令常见于Unix和类Unix的操作系统之中,用于设置周期性被执行的指令.该命令从标准输入设备读取指令,并将其存放于“crontab”文件中,以供之后读取和执行.该 ...

  7. [转]响应式WEB设计学习(3)—如何改善移动设备网页的性能

    原文地址:http://www.jb51.net/web/70362.html 前言 移动设备由于受到带宽.处理器运算速度的限制,因而对网页的性能有更高的要求.究竟是网页中的何种元素拉低了网页在移动设 ...

  8. JQuery触发radio或checkbox的change事件

    在JQuery中,当给radio或checkbox添加一个change事件时,如果它的值发生变化就会触发change事件;本文将详细介绍如何利用JQuery触发Checkbox的change事件需要了 ...

  9. iframe与主框架跨域相互访问方法【转】

    转自:http://blog.csdn.net/fdipzone/article/details/17619673 1.同域相互访问 假设A.html 与 b.html domain都是localho ...

  10. 算法与数据结构之顺序查找(C语言)

    #include<stdio.h> #include<stdlib.h> //顺序查找基本思想:从线性表的一端开始,逐个检查关键字是否满足给定的条件 int Sequentia ...