Chapter 07-Basic statistics(Part2 Frequency and contingency tables)
这一部分使用在vcd包中的Arthritis数据集。
> library(vcd)
载入需要的程辑包:MASS
载入需要的程辑包:grid
载入需要的程辑包:colorspace
> head(Arthritis)
ID Treatment Sex Age Improved
1 57 Treated Male 27 Some
2 46 Treated Male 29 None
3 77 Treated Male 30 None
4 17 Treated Male 32 Marked
5 36 Treated Male 46 Marked
6 23 Treated Male 58 Marked
1. generating frequency tables

(1) ONE-WAY TABLE
例01:
> mytable<-with(Arthritis,table(Improved))
> mytable
Improved
None Some Marked
42 14 28
>
> prop.table(mytable)
Improved
None Some Marked
0.5000000 0.1666667 0.3333333
>
> prop.table(mytable)*100
Improved
None Some Marked
50.00000 16.66667 33.33333
table()函数:简单的频率(frequency)表示;
· table()函数会缺省自动的忽略missing values(NAs),要包含NA值需要使用选项useNA="ifany"
prop.table()函数:比例(proportion)表示;
prop.table()*100函数:百分数(percentage)表示。
(2)TWO-WAY TABLES
例02:
> mytable<-xtabs(~Treatment+Improved,data=Arthritis)
> mytable
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
(1)mytable<-table(A,B)
·A是行变量,B是列变量。
(2)xtabs()函数:使用公式方式的输入(formula style input)来创建一个列联表(contingency table)。
mytable<-xtabs(~A+B,data=mydata)
例03:
> margin.table(mytable,1)
Treatment
Placebo Treated
43 41
> prop.table(mytable,1)
Improved
Treatment None Some Marked
Placebo 0.6744186 0.1627907 0.1627907
Treated 0.3170732 0.1707317 0.5121951
> margin.table(mytable,2)
Improved
None Some Marked
42 14 28
> prop.table(mytable,2)
Improved
Treatment None Some Marked
Placebo 0.6904762 0.5000000 0.2500000
Treated 0.3095238 0.5000000 0.7500000
> prop.table(mytable)
Improved
Treatment None Some Marked
Placebo 0.34523810 0.08333333 0.08333333
Treated 0.15476190 0.08333333 0.25000000
margin.table():产生marginal frequencies;
prop.table():产生proportions。
·index(1):指在table()中的第一个变量;
·index(2):指在table()中的第二个变量。
例04:
> addmargins(mytable)
Improved
Treatment None Some Marked Sum
Placebo 29 7 7 43
Treated 13 7 21 41
Sum 42 14 28 84
> addmargins(prop.table(mytable))
Improved
Treatment None Some Marked Sum
Placebo 0.34523810 0.08333333 0.08333333 0.51190476
Treated 0.15476190 0.08333333 0.25000000 0.48809524
Sum 0.50000000 0.16666667 0.33333333 1.00000000
addmargins():add marginal sums to these tables;
·缺省时为所有变量创建sum margins;
例04(变1):仅仅添加一个 sum column
> addmargins(prop.table(mytable,1),2)
Improved
Treatment None Some Marked Sum
Placebo 0.6744186 0.1627907 0.1627907 1.0000000
Treated 0.3170732 0.1707317 0.5121951 1.0000000
例04(变2):仅仅添加一个sum row
> addmargins(prop.table(mytable,2),1)
Improved
Treatment None Some Marked
Placebo 0.6904762 0.5000000 0.2500000
Treated 0.3095238 0.5000000 0.7500000
Sum 1.0000000 1.0000000 1.0000000
(3)MULTIDIMENSIONAL TABLES
例05:
> install.packages("gmodels")
--- 在此連線階段时请选用CRAN的鏡子 --- also installing the dependencies ‘gtools’, ‘gdata’
试开URL
’http://ftp.ctex.org/mirrors/CRAN/bin/windows/contrib/3.0/gtools_3.0.0.zip'
Content type 'application/zip' length 112950 bytes (110 Kb)
打开了URL
downloaded 110 Kb
试开URL
’http://ftp.ctex.org/mirrors/CRAN/bin/windows/contrib/3.0/gdata_2.13.2.zip'
Content type 'application/zip' length 850387 bytes (830 Kb)
打开了URL
downloaded 830 Kb
试开URL
’http://ftp.ctex.org/mirrors/CRAN/bin/windows/contrib/3.0/gmodels_2.15.4.zip'
Content type 'application/zip' length 76708 bytes (74 Kb)
打开了URL
downloaded 74 Kb
程序包‘gtools’打开成功,MD5和检查也通过
程序包‘gdata’打开成功,MD5和检查也通过
程序包‘gmodels’打开成功,MD5和检查也通过
下载的二进制程序包在
C:\Users\seven-wang\AppData\Local\Temp\RtmpIlHLxM\downloaded_packages里
> library(vcd)
载入需要的程辑包:MASS
载入需要的程辑包:grid
载入需要的程辑包:colorspace
> library(gmodels)
> CrossTable(Arthritis$Treatment,Arthritis$Improved)
Cell Contents
|-----------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
| N / Col Total |
| N / Table Total |
|-----------------------------|
Total Observations in Table: 84
| Arthritis$Improved Arthritis$Treatment | None | Some | Marked | Row Total |
---------------------------------------------|------------|-----------|-----------|--------------|
Placebo | 29 | 7 | 7 | 43 |
| 2.616 | 0.004 | 3.752 | |
| 0.674 | 0.163 | 0.163 | 0.512 |
| 0.690 | 0.500 | 0.250 | |
| 0.345 | 0.083 | 0.083 | |
----------------------------------------------|------------|-----------|-----------|---------------|
Treated | 13 | 7 | 21 | 41 |
| 2.744 | 0.004 | 3.935 | |
| 0.317 | 0.171 | 0.512 | 0.488 |
| 0.310 | 0.500 | 0.750 | |
| 0.155 | 0.083 | 0.250 | |
----------------------------------------------|-------------|-----------|-----------|---------------|
Column Total | 42 | 14 | 28 | 84 |
| 0.500 | 0.167 | 0.333 | |
-----------------------------------------------|-------------|------------|-----------|--------------|
gmodels包中的CrossTable()函数:创建two-way tables models after PROC FREO in SAS or CROSSTABS SPSS.
例06:
> mytable<-xtabs(~Treatment+Sex+Improved,data=Arthritis)
> mytable
, , Improved = None
Sex
Treatment Female Male
Placebo 19 10
Treated 6 7
, , Improved = Some
Sex
Treatment Female Male
Placebo 7 0
Treated 5 2
, , Improved = Marked
Sex
Treatment Female Male
Placebo 6 1
Treated 16 5
> ftable(mytable)
Improved None Some Marked
Treatment Sex
Placebo Female 19 7 6
Male 10 0 1
Treated Female 6 5 16
Male 7 2 5
> margin.table(mytable,1)
Treatment
Placebo Treated
43 41
> margin.table(mytable,2)
Sex
Female Male
59 25
> margin.table(mytable,3)
Improved None Some Marked
42 14 28
> margin.table(mytable,c(,31))
Improved Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
> ftable(prop.table(mytable,c(1,2)))
Improved None Some Marked
Treatment Sex
Placebo Female 0.59375000 0.21875000 0.18750000
Male 0.90909091 0.00000000 0.09090909
Treated Female 0.22222222 0.18518519 0.59259259
Male 0.50000000 0.14285714 0.35714286
2. Test of independence
例07:CHI-AQUARE TEST OF INDEPENDENCE
> library(vcd)
> mytable<-xtabs(~Treatment+Improved,data=Arthritis)
> chisq.test(mytable) Pearson's Chi-squared test data: mytable
X-squared = 13.055, df = 2, p-value = 0.001463
> mytable<-xtabs(~Improved+Sex,data=Arthritis)
> chisq.test(mytable) Pearson's Chi-squared test data: mytable
X-squared = 4.8407, df = 2, p-value = 0.08889 Warning message:
In chisq.test(mytable) : Chi-squared近似算法有可能不准
chisq.test()函数: 产生一个chi-square of independence of the row and column variables.
例08:FISHER'S EXACT TEST
> mytable<-xtabs(~Treatment+Improved,data=Arthritis)
> fisher.test(mytable) Fisher's Exact Test for Count Data data: mytable
p-value = 0.001393
alternative hypothesis: two.sided
fisher.test()函数:产生一个Fisher 's exact test。
·Fisher's exact test :evaluate the hypothesis of independence of rows and columns in a contingency table with fixed marginals.
例09:COCHRAN-MANTEL-HAENSZEL TEST
> mytable<-xtabs(~Treatment+Improved+Sex,data=Arthritis)
> mantelhaen.test(mytable) Cochran-Mantel-Haenszel test data: mytable
Cochran-Mantel-Haenszel M^2 = 14.6323, df = 2, p-value = 0.0006647
mantelhaen.test()函数:提供一个Cochran-Mantel-Haenszel chi-aquare test of null ·hypothesis that two nominal variables are conditionally independent in each straum of a third variable.
3. measures of association
例10:
> library(vcd)
> mytable<-xtabs(~Treatment+Improved,data=Arthritis)
> assocstats(mytable)
X^2 df P(> X^2)
Likelihood Ratio 13.530 2 0.0011536
Pearson 13.055 2 0.0014626 Phi-Coefficient : 0.394
Contingency Coeff.: 0.367
Cramer's V : 0.394
vcd包中associstats()函数:计算 phi coefficient,contingency coefficient,Cramer's V.
Chapter 07-Basic statistics(Part2 Frequency and contingency tables)的更多相关文章
- Intro to Python for Data Science Learning 8 - NumPy: Basic Statistics
NumPy: Basic Statistics from:https://campus.datacamp.com/courses/intro-to-python-for-data-science/ch ...
- Spark MLlib 之 Basic Statistics
Spark MLlib提供了一些基本的统计学的算法,下面主要说明一下: 1.Summary statistics 对于RDD[Vector]类型,Spark MLlib提供了colStats的统计方法 ...
- Chapter 06—Basic graphs
三. 柱状图(Histogram) 1. hist():画柱状图 ·breaks(可选项):控制柱状图的小柱子的条数: ·freq=FALSE:基于概率(probability),而非频率(frequ ...
- Chapter 04—Basic Data Management
1. 创建新的变量 variable<-expression expression:包含一组大量的操作符和函数.常用的算术操作符如下表: 例1:根据已知变量,创建新变量的三种途径 > my ...
- Chapter 2 Basic Elements of JAVA
elaborate:详细说明 Data TypesJava categorizes data into different types, and only certain operationscan ...
- 吴裕雄--天生自然 R语言开发学习:基本统计分析(续三)
#---------------------------------------------------------------------# # R in Action (2nd ed): Chap ...
- 吴裕雄--天生自然 R语言开发学习:基本统计分析
#---------------------------------------------------------------------# # R in Action (2nd ed): Chap ...
- [Hive - LanguageManual] Statistics in Hive
Statistics in Hive Statistics in Hive Motivation Scope Table and Partition Statistics Column Statist ...
- BK: Data mining, Chapter 2 - getting to know your data
Why: real-world data are typically noisy, enormous in volume, and may originate from a hodgepodge of ...
随机推荐
- Java自动化测试框架-07 - TestNG之Factory篇 - 欢快畅游梦幻工厂(详细教程)
简介 最近忙着装修博客园,没时间更新文章,今天终于抽出时间把上次写的一半的文章给写完了,新的博客园风格,希望大家喜欢.今天继续介绍testng的相关知识--工厂. 工厂允许你动态的创建测试.例如,假设 ...
- Java设计模式:23种设计模式
1. 软件设计模式概述 2. GoF 的 23 种设计模式的分类和功能 3. UML中的类图及类图之间的关系 4. 开闭原则——面向对象设计原则 5. 里氏替换原则——面向对象设计原则 6. 依赖倒置 ...
- 说说 Python3 中的数字处理
最近在处理订单相关的问题,踩了数字的一些坑,在此记录下. 其中有问题的代码涉及金额比较,便于描述,假设了下面一段代码 def is_paid(pay_price, paid_price): retur ...
- SQL 中更新一个表的数据是从另外的表(或者自己本身的表)查询出来的
模板1: update 表1 set 表1.字段1 = ( select 表1字段或者表2字段 from 表2 where 表1主键 = 表2外键 及其他条件 ) where 表1.字段 = ...
- PMD-Java代码静态分析工具使用
如今,使用代码分析工具来代替人工进行代码审查,已经是大势所趋了.用于Java代码检测的工具中,不乏许许多多的佼佼者,其中PMD就是其中一款.PMD既可以独立运行,也可以以命令行的形式运行,还可以作为插 ...
- windows下离线安装mysql8.0服务(支持多个安装,端口不同就可以)
1.官网下载 mysql文件.官网下载链接:https://dev.mysql.com/downloads/mysql/ 选择mysql下载的系统版本. 此处可以下载MSI安装包,图简单的朋友可以 ...
- 使用Typescript重构axios(二十一)——请求取消功能:添加axios.isCancel接口
0. 系列文章 1.使用Typescript重构axios(一)--写在最前面 2.使用Typescript重构axios(二)--项目起手,跑通流程 3.使用Typescript重构axios(三) ...
- 手把手带你实战下Spring的七种事务传播行为
目录 本文目录 一.什么是事务传播行为? 二.事务的7种传播行为 三.7种传播行为实战 本文介绍Spring的七种事务传播行为并通过代码演示下. 本文目录 一.什么是事务传播行为? 事务传播行为(pr ...
- 基于c/s架构的远程登陆服务的步骤。
1:上/下位机安装相应的服务程序.(确保内核支持该服务)2:上位机(作为服务器端)配置能够给下位机访问目录的所在地,及其读写权限.3:在/dev目录下创建该服务其所需要使用的虚拟文件设备,同时按照该服 ...
- phpstorm设置内存限制的方法
phpstorm设置内存限制的方法有时候用phpstorm写代码 1个文件代码多的话会很卡 那就要修改内存限制 E:\Program Files (x86)\JetBrains\PhpStorm 20 ...