这一部分使用在vcd包中的Arthritis数据集。

> library(vcd)
载入需要的程辑包:MASS
载入需要的程辑包:grid
载入需要的程辑包:colorspace
> head(Arthritis)
ID Treatment Sex Age Improved
1 57 Treated Male 27 Some
2 46 Treated Male 29 None
3 77 Treated Male 30 None
4 17 Treated Male 32 Marked
5 36 Treated Male 46 Marked
6 23 Treated Male 58 Marked

1. generating frequency tables

(1) ONE-WAY TABLE

例01:

> mytable<-with(Arthritis,table(Improved))
> mytable
Improved
None Some Marked
42 14 28
>
> prop.table(mytable)
Improved
None Some Marked
0.5000000 0.1666667 0.3333333
>
> prop.table(mytable)*100
Improved
None Some Marked
50.00000 16.66667 33.33333

table()函数:简单的频率(frequency)表示;

· table()函数会缺省自动的忽略missing values(NAs),要包含NA值需要使用选项useNA="ifany"

prop.table()函数:比例(proportion)表示;

prop.table()*100函数:百分数(percentage)表示。

(2)TWO-WAY TABLES

例02:

> mytable<-xtabs(~Treatment+Improved,data=Arthritis)
> mytable
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21

(1)mytable<-table(A,B)

·A是行变量,B是列变量。

(2)xtabs()函数:使用公式方式的输入(formula style input)来创建一个列联表(contingency table)。

mytable<-xtabs(~A+B,data=mydata)

例03:

> margin.table(mytable,1)
Treatment
Placebo Treated
43 41
> prop.table(mytable,1)
Improved
Treatment None Some Marked
Placebo 0.6744186 0.1627907 0.1627907
Treated 0.3170732 0.1707317 0.5121951
> margin.table(mytable,2)
Improved
None Some Marked
42 14 28
> prop.table(mytable,2)
Improved
Treatment None Some Marked
Placebo 0.6904762 0.5000000 0.2500000
Treated 0.3095238 0.5000000 0.7500000
> prop.table(mytable)
Improved
Treatment None Some Marked
Placebo 0.34523810 0.08333333 0.08333333
Treated 0.15476190 0.08333333 0.25000000

margin.table():产生marginal frequencies;

prop.table():产生proportions。

·index(1):指在table()中的第一个变量;

·index(2):指在table()中的第二个变量。

例04:

> addmargins(mytable)
Improved
Treatment None Some Marked Sum
Placebo 29 7 7 43
Treated 13 7 21 41
Sum 42 14 28 84
> addmargins(prop.table(mytable))
Improved
Treatment None Some Marked Sum
Placebo 0.34523810 0.08333333 0.08333333 0.51190476
Treated 0.15476190 0.08333333 0.25000000 0.48809524
Sum 0.50000000 0.16666667 0.33333333 1.00000000

addmargins():add marginal sums to these tables;

·缺省时为所有变量创建sum margins;

例04(变1):仅仅添加一个 sum column

> addmargins(prop.table(mytable,1),2)
Improved
Treatment None Some Marked Sum
Placebo 0.6744186 0.1627907 0.1627907 1.0000000
Treated 0.3170732 0.1707317 0.5121951 1.0000000

例04(变2):仅仅添加一个sum row

> addmargins(prop.table(mytable,2),1)
Improved
Treatment None Some Marked
Placebo 0.6904762 0.5000000 0.2500000
Treated 0.3095238 0.5000000 0.7500000
Sum 1.0000000 1.0000000 1.0000000

(3)MULTIDIMENSIONAL TABLES

例05:

> install.packages("gmodels")

--- 在此連線階段时请选用CRAN的鏡子 --- also installing the dependencies ‘gtools’, ‘gdata’

试开URL

’http://ftp.ctex.org/mirrors/CRAN/bin/windows/contrib/3.0/gtools_3.0.0.zip'

Content type 'application/zip' length 112950 bytes (110 Kb)

打开了URL

downloaded 110 Kb

试开URL

’http://ftp.ctex.org/mirrors/CRAN/bin/windows/contrib/3.0/gdata_2.13.2.zip'

Content type 'application/zip' length 850387 bytes (830 Kb)

打开了URL

downloaded 830 Kb

试开URL

’http://ftp.ctex.org/mirrors/CRAN/bin/windows/contrib/3.0/gmodels_2.15.4.zip'

Content type 'application/zip' length 76708 bytes (74 Kb)

打开了URL

downloaded 74 Kb

程序包‘gtools’打开成功,MD5和检查也通过

程序包‘gdata’打开成功,MD5和检查也通过

程序包‘gmodels’打开成功,MD5和检查也通过

下载的二进制程序包在

C:\Users\seven-wang\AppData\Local\Temp\RtmpIlHLxM\downloaded_packages里

> library(vcd)

载入需要的程辑包:MASS

载入需要的程辑包:grid

载入需要的程辑包:colorspace

> library(gmodels)

> CrossTable(Arthritis$Treatment,Arthritis$Improved)

Cell Contents

|-----------------------------|

|                                  N |

|  Chi-square contribution  |

|                 N / Row Total |

|                  N / Col Total |

|               N / Table Total |

|-----------------------------|

Total Observations in Table:  84

| Arthritis$Improved Arthritis$Treatment |      None |      Some |    Marked | Row Total |

---------------------------------------------|------------|-----------|-----------|--------------|

Placebo |          29 |            7 |           7 |              43 |

|      2.616 |     0.004 |     3.752 |                  |

|      0.674 |     0.163 |     0.163 |         0.512 |

|      0.690 |     0.500 |     0.250 |                   |

|      0.345 |     0.083 |     0.083 |                   |

----------------------------------------------|------------|-----------|-----------|---------------|

Treated |           13 |           7 |          21 |              41 |

|       2.744 |     0.004 |     3.935 |                  |

|       0.317 |     0.171 |     0.512 |         0.488 |

|       0.310 |     0.500 |     0.750 |                   |

|       0.155 |     0.083 |     0.250 |                   |

----------------------------------------------|-------------|-----------|-----------|---------------|

Column Total |            42 |          14 |         28 |               84 |

|       0.500 |      0.167 |     0.333 |                   |

-----------------------------------------------|-------------|------------|-----------|--------------|

gmodels包中的CrossTable()函数:创建two-way tables models  after PROC FREO in SAS or CROSSTABS SPSS.

例06:

> mytable<-xtabs(~Treatment+Sex+Improved,data=Arthritis)

> mytable

, , Improved = None

Sex

Treatment Female Male

Placebo     19   10

Treated      6    7

, , Improved = Some

Sex

Treatment Female Male

Placebo      7    0

Treated      5    2

, , Improved = Marked

Sex

Treatment Female Male

Placebo      6    1

Treated     16    5

> ftable(mytable)

Improved None Some Marked

Treatment Sex

Placebo   Female              19    7      6

Male              10    0      1

Treated   Female               6    5     16

Male               7    2      5

> margin.table(mytable,1)

Treatment

Placebo Treated

43      41

> margin.table(mytable,2)

Sex

Female   Male

59     25

> margin.table(mytable,3)

Improved   None   Some Marked

42     14     28

> margin.table(mytable,c(,31))

Improved Treatment None Some Marked

Placebo             29    7      7

Treated             13    7     21

> ftable(prop.table(mytable,c(1,2)))

Improved       None       Some     Marked

Treatment Sex

Placebo   Female            0.59375000 0.21875000 0.18750000

Male            0.90909091 0.00000000 0.09090909

Treated   Female            0.22222222 0.18518519 0.59259259

Male            0.50000000 0.14285714 0.35714286

2. Test of independence

例07:CHI-AQUARE TEST OF INDEPENDENCE

> library(vcd)
> mytable<-xtabs(~Treatment+Improved,data=Arthritis)
> chisq.test(mytable) Pearson's Chi-squared test data: mytable
X-squared = 13.055, df = 2, p-value = 0.001463
> mytable<-xtabs(~Improved+Sex,data=Arthritis)
> chisq.test(mytable) Pearson's Chi-squared test data: mytable
X-squared = 4.8407, df = 2, p-value = 0.08889 Warning message:
In chisq.test(mytable) : Chi-squared近似算法有可能不准

chisq.test()函数: 产生一个chi-square of independence of the row and column variables.

例08:FISHER'S EXACT TEST

> mytable<-xtabs(~Treatment+Improved,data=Arthritis)
> fisher.test(mytable) Fisher's Exact Test for Count Data data: mytable
p-value = 0.001393
alternative hypothesis: two.sided

fisher.test()函数:产生一个Fisher 's exact test。

·Fisher's exact test :evaluate the hypothesis of independence of rows and columns in a contingency table with fixed marginals.

例09:COCHRAN-MANTEL-HAENSZEL TEST

> mytable<-xtabs(~Treatment+Improved+Sex,data=Arthritis)
> mantelhaen.test(mytable) Cochran-Mantel-Haenszel test data: mytable
Cochran-Mantel-Haenszel M^2 = 14.6323, df = 2, p-value = 0.0006647

mantelhaen.test()函数:提供一个Cochran-Mantel-Haenszel chi-aquare test of null ·hypothesis that two nominal variables are conditionally independent in each straum of a third variable.

3. measures of association

例10:

> library(vcd)
> mytable<-xtabs(~Treatment+Improved,data=Arthritis)
> assocstats(mytable)
X^2 df P(> X^2)
Likelihood Ratio 13.530 2 0.0011536
Pearson 13.055 2 0.0014626 Phi-Coefficient : 0.394
Contingency Coeff.: 0.367
Cramer's V : 0.394

vcd包中associstats()函数:计算 phi coefficient,contingency coefficient,Cramer's V.

Chapter 07-Basic statistics(Part2 Frequency and contingency tables)的更多相关文章

  1. Intro to Python for Data Science Learning 8 - NumPy: Basic Statistics

    NumPy: Basic Statistics from:https://campus.datacamp.com/courses/intro-to-python-for-data-science/ch ...

  2. Spark MLlib 之 Basic Statistics

    Spark MLlib提供了一些基本的统计学的算法,下面主要说明一下: 1.Summary statistics 对于RDD[Vector]类型,Spark MLlib提供了colStats的统计方法 ...

  3. Chapter 06—Basic graphs

    三. 柱状图(Histogram) 1. hist():画柱状图 ·breaks(可选项):控制柱状图的小柱子的条数: ·freq=FALSE:基于概率(probability),而非频率(frequ ...

  4. Chapter 04—Basic Data Management

    1. 创建新的变量 variable<-expression expression:包含一组大量的操作符和函数.常用的算术操作符如下表: 例1:根据已知变量,创建新变量的三种途径 > my ...

  5. Chapter 2 Basic Elements of JAVA

    elaborate:详细说明 Data TypesJava categorizes data into different types, and only certain operationscan ...

  6. 吴裕雄--天生自然 R语言开发学习:基本统计分析(续三)

    #---------------------------------------------------------------------# # R in Action (2nd ed): Chap ...

  7. 吴裕雄--天生自然 R语言开发学习:基本统计分析

    #---------------------------------------------------------------------# # R in Action (2nd ed): Chap ...

  8. [Hive - LanguageManual] Statistics in Hive

    Statistics in Hive Statistics in Hive Motivation Scope Table and Partition Statistics Column Statist ...

  9. BK: Data mining, Chapter 2 - getting to know your data

    Why: real-world data are typically noisy, enormous in volume, and may originate from a hodgepodge of ...

随机推荐

  1. pytest -- 中文文档

    pytest-chinese-doc pytest官方文档(5.1.3版本)的中文翻译,但不仅仅是简单的翻译: 更多的例子,尽量做到每一知识点都有例子: 更多的拓展阅读,部分章节添加了作者学习时,所查 ...

  2. 框架搭建与EF常用基类实现

    前两篇简单谈了一些.Net Core的优势以及机构设计的一些思路,这一篇开始,我们将从零开始搭建架构,底层我们将采用EF来访问数据库,所以这篇我们将贴一下EF常用操作的基类. 简单介绍下一些类库将要实 ...

  3. 命令行代理神器 proxychains

    因为某些原因,我们需要在命令行下载一些国外的资源,这个时候如果使用 wget,curl,或者 aria2c 的时候,往往又没有速度.这个时候我们需要使用代理来进行加速. 我本地搭的有 ss,但 ss ...

  4. Go netpoll I/O 多路复用构建原生网络模型之源码深度解析

    导言 Go 基于 I/O multiplexing 和 goroutine 构建了一个简洁而高性能的原生网络模型(基于 Go 的I/O 多路复用 netpoll),提供了 goroutine-per- ...

  5. 使用vue-cookies操作cookie

    1.前言 在vue中如果想要操作cookie,除了使用之前我们自己封装好的操作cookie的方法之外,我们还可以使用vue-cookies插件,这是一个简单的Vue.js插件,专门用于在vue中处理浏 ...

  6. python购物车练习题

    # 购物车练习# 1.启动程序后,让用户输入工资,打印商品列表# 2.允许用户根据商品编号购买商品# 3.用户选择商品后,检测余额是否够,够就直接扣款,不够就提醒# 4.可随时退出,退出时,打印已购买 ...

  7. 爬虫学习--Day4(小猿圈爬虫开发_2)

    requests模块 - urllib模块 - requests模块 requests模块:python中原生的一款基于网络请求的模块,功能非常强大,简单便捷,效率极高.作用:模拟浏览器发送请求. 如 ...

  8. 『题解』POJ1753 Flip Game

    题目传送门 题意描述 有\(4 \times 4\)的正方形,每个格子要么是黑色,要么是白色,当把一个格子的颜色改变(黑\(\to\)白 或 白\(\to\)黑)时,其周围上下左右(如果存在的话)的格 ...

  9. tslib1.1移植

    安装步骤: 1.准备工作确保以下软件已安装 # apt-get install autoconf(或autoconf2.13)# apt-get install automake# apt-get i ...

  10. 009-2010网络最热的 嵌入式学习|ARM|Linux|wince|ucos|经典资料与实例分析

    前段时间做了一个关于ARM9 2440资料的汇总帖,很高兴看到21ic和CSDN等论坛朋友们的支持和鼓励.当年学单片机的时候datasheet和学习资料基本都是在论坛上找到的,也遇到很多好心的高手朋友 ...