Objects

R has five basic or “atomic” classes of objects:

character

numeric (real numbers)

integer

complex

logical (True/False)

The most basic object is a vector

A vector can only contain objects of the same class

BUT: The one exception is a list, which is represented as a vector but can contain objects of

different classes (indeed, that’s usually why we use them)

Empty vectors can be created with the vector() function.

Numbers

Numbers in R a generally treated as numeric objects (i.e. double precision real numbers)

If you explicitly want an integer, you need to specify the L suffix

Ex: Entering 1 gives you a numeric object; entering 1L explicitly gives you an integer.

There is also a special number Inf which represents infinity; e.g. 1 / 0; Inf can be used in

ordinary calculations; e.g. 1 / Inf is 0

The value NaN represents an undefined value (“not a number”); e.g. 0 / 0; NaN can also be

thought of as a missing value (more on that later)

Attributes

R objects can have attributes

names, dimnames

dimensions (e.g. matrices, arrays)

class

length

other user-defined attributes/metadata

Attributes of an object can be accessed using the attributes() function.

Creating Vectors

The c() function can be used to create vectors of objects.

Using the vector() function

> x <- vector("numeric", length = 10)

> x

[1] 0 0 0 0 0 0 0 0 0 0

Mixing Objects Mixing Objects

> y <- c(1.7, "a") ## character

> y <- c(TRUE, 2) ## numeric

> y <- c("a", TRUE) ## character

When different objects are mixed in a vector, coercion occurs so that every element in the vector is

of the same class.

Explicit Coercion

Objects can be explicitly coerced from one class to another using the as.* functions, if available.

> x <- 0:6

> class(x)

[1] "integer"

> as.numeric(x)

[1] 0 1 2 3 4 5 6

> as.logical(x)

[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE

> as.character(x)

[1] "0" "1" "2" "3" "4" "5" "6"

Nonsensical coercion results in NAs.

> x <- c("a", "b", "c")

> as.numeric(x)

[1] NA NA NA

Warning message:

NAs introduced by coercion

> as.logical(x)

[1] NA NA NA

> as.complex(x)

[1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i

Lists

Lists are a special type of vector that can contain elements of different classes. Lists are a very

important data type in R and you should get to know them well.

> x <- list(1, "a", TRUE, 1 + 4i)

> x

[[1]]

[1] 1

[[2]]

[1] "a"

[[3]]

[1] TRUE

[[4]]

[1] 1+4i

Matrices Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)

> m <- matrix(nrow = 2, ncol = 3)

> m

[,1] [,2] [,3]

[1,] NA NA NA

[2,] NA NA NA

> dim(m)

[1] 2 3

> attributes(m)

$dim

[1] 2 3

Matrices (cont’d)

Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner and running down the columns.

> m <- matrix(1:6, nrow = 2, ncol = 3)

> m

[,1] [,2] [,3]

[1,] 1 3 5

[2,] 2 4 6

Matrices can also be created directly from vectors by adding a dimension attribute.

> m <- 1:10

> m

[1] 1 2 3 4 5 6 7 8 9 10

> dim(m) <- c(2, 5)

> m

[,1] [,2] [,3] [,4] [,5]

[1,] 1 3 5 7 9

[2,] 2 4 6 8 10

cbind-ing and rbind-ing cbind-ing and rbind-ing

Matrices can be created by column-binding or row-binding with cbind() and rbind().

> x <- 1:3

> y <- 10:12

> cbind(x, y)

x y

[1,] 1 10

[2,] 2 11

[3,] 3 12

> rbind(x, y)

[,1] [,2] [,3]

x 1 2 3

y 10 11 12

Factors

Factors are used to represent categorical data. Factors can be unordered or ordered. One can think

of a factor as an integer vector where each integer has a label.

Factors are treated specially by modelling functions like lm() and glm()

Using factors with labels is better than using integers because factors are self-describing; having

a variable that has values “Male” and “Female” is better than a variable that has values 1 and 2.

> x <- factor(c("yes", "yes", "no", "yes", "no"))

> x

[1] yes yes no yes no

Levels: no yes

> table(x)

x

no yes

2 3

> unclass(x)

[1] 2 2 1 2 1

attr(,"levels")

[1] "no" "yes"

The order of the levels can be set using the levels argument to factor(). This can be important

in linear modelling because the first level is used as the baseline level.

> x <- factor(c("yes", "yes", "no", "yes", "no"),

levels = c("yes", "no"))

> x

[1] yes yes no yes no

Levels: yes no

Missing Values Missing Values

Missing values are denoted by NA or NaN for undefined mathematical operations.

is.na() is used to test objects if they are NA

is.nan() is used to test for NaN

NA values have a class also, so there are integer NA, character NA, etc.

A NaN value is also NA but the converse is not true

> x <- c(1, 2, NA, 10, 3)

> is.na(x)

[1] FALSE FALSE TRUE FALSE FALSE

> is.nan(x)

[1] FALSE FALSE FALSE FALSE FALSE

> x <- c(1, 2, NaN, NA, 4)

> is.na(x)

[1] FALSE FALSE TRUE TRUE FALSE

> is.nan(x)

[1] FALSE FALSE TRUE FALSE FALSE

Data Frames

Data frames are used to store tabular data

They are represented as a special type of list where every element of the list has to have the

same length

Each element of the list can be thought of as a column and the length of each element of the list

is the number of rows

Unlike matrices, data frames can store different classes of objects in each column (just like lists);

matrices must have every element be the same class

Data frames also have a special attribute called row.names

Data frames are usually created by calling read.table() or read.csv()

Can be converted to a matrix by calling data.matrix()

> x <- data.frame(foo = 1:4, bar = c(T, T, F, F))

> x

foo bar

1 1 TRUE

2 2 TRUE

3 3 FALSE

4 4 FALSE

> nrow(x)

[1] 4

> ncol(x)

[1] 2

Names

R objects can also have names, which is very useful for writing readable code and self-describing

objects.

> x <- 1:3

> names(x)

NULL

> names(x) <- c("foo", "bar", "norf")

> x

foo bar norf

1 2 3

> names(x)

[1] "foo" "bar" "norf"

Summary

Data Types

atomic classes: numeric, logical, character, integer, complex \

vectors, lists

factors

missing values

data frames

names

R Programming week1-Data Type的更多相关文章

  1. R Programming week1-Reading Data

    Reading Data There are a few principal functions reading data into R. read.table, read.csv, for read ...

  2. Coursera系列-R Programming第二周

    博客总目录,记录学习R与数据分析的一切:http://www.cnblogs.com/weibaar/p/4507801.html  --- 好久没发博客 且容我大吼一句 终于做完这周R Progra ...

  3. Coursera系列-R Programming第三周-词法作用域

    完成R Programming第三周 这周作业有点绕,更多地是通过一个缓存逆矩阵的案例,向我们示范[词法作用域 Lexical Scopping]的功效.但是作业里给出的函数有点绕口,花费了我们蛮多心 ...

  4. salesforce 零基础开发入门学习(四)多表关联下的SOQL以及表字段Data type详解

    建立好的数据表在数据库中查看有很多方式,本人目前采用以下两种方式查看数据表. 1.采用schema Builder查看表结构以及多表之间的关联关系,可以登录后点击setup在左侧搜索框输入schema ...

  5. include pointers as a primitive data type

    Computer Science An Overview _J. Glenn Brookshear _11th Edition Many modern programming languages in ...

  6. 1月21日 Reference Data Type 数据类型,算法基础说明,二分搜索算法。(课程内容)

    Reference Datat Types 引用参考数据类型 -> 组合数据类型 Array, Hash和程序员自定义的复合资料类型 组合数据的修改: 组合数据类型的变量,不是直接存值,而是存一 ...

  7. 【转载】salesforce 零基础开发入门学习(四)多表关联下的SOQL以及表字段Data type详解

    salesforce 零基础开发入门学习(四)多表关联下的SOQL以及表字段Data type详解   建立好的数据表在数据库中查看有很多方式,本人目前采用以下两种方式查看数据表. 1.采用schem ...

  8. PHP 笔记一(systax/variables/echo/print/Data Type)

    PHP stands for "Hypertext Preprocessor" ,it is a server scripting language. What Can PHP D ...

  9. JAVA 1.2(原生数据类型 Primitive Data Type)

    1. Java的数据类型分为2类 >> 原生数据类型(primitive data type) >> 引用数据类型(reference data type) 3. 常量和变量 ...

随机推荐

  1. eclipse中经常使用快捷键

    熟练一些快捷键,会使你的开发更加快捷.高效,值得花些时间学一下! 1. ctrl+shift+r:打开资源 这可能是全部快捷键组合中最省时间的了.这组快捷键能够让你打开你的工作区中不论什么一个文件,而 ...

  2. XML解析(DOM)

    001 public class DOM_Parser { 002   003     public static void main(String[] args) { 004         try ...

  3. 并不对劲的trie树

    听上去像是破坏植物的暴力行为(并不). 可以快速查询某个字符串在某个字符串集中出现了几次,而且听上去比字符串哈希靠谱. 把整个字符串集建成树,边权是字符,对于字符串结尾的节点进行特殊标记. 这样一方面 ...

  4. 【HDU 1599】 Find the mincost route

    [题目链接] 点击打开链接 [算法] 弗洛伊德求最小环模板 我们知道,在一个环上,一定有一个有且仅有一个编号最大的点,设这个点为k,起点为i,终点为j,那么 mincost = dist[i][j] ...

  5. yaffs2根文件系统的构建过程

    基于BusyBox-1.19.2  (以其它作者的作为参考) 1. 下载BusyBox的源码 http://busybox.net/ 2. 解压#tar xvzf busybox-1.19.2.tgz ...

  6. bzoj 3732: Network【克鲁斯卡尔+树链剖分】

    先做最小生成树,这样就保证了最大值最小 然后随便用个什么东西维护一下最大值,我用的树剖log^2,倍增会更快 #include<iostream> #include<cstdio&g ...

  7. MySQL调优之数据类型选择原则

    本文涉及:高可用数据库设计时数据类型的选择原则 在进行数据库设计时,如果能够选择最恰当的数据类型就可以为后期的数据库调优打好最坚实的基础 选择数据类型的原则 更小的通常更好 例如存储订单状态字段很多时 ...

  8. 【原创】《从0开始学RocketMQ》—集群搭建

    用两台服务器,搭建出一个双master双slave.无单点故障的高可用 RocketMQ 集群.此处假设两台服务器的物理 IP 分别为:192.168.50.1.192.168.50.2. 内容目录 ...

  9. IE6,7bug大搜集

    断断续续的在开发过程中收集了好多的bug以及其解决的办法,都在这个文章里面记录下来了!希望以后解决类似问题的时候能够快速解决 ,也希望大家能在留言里面跟进自己发现的ie6 7 8bug和解决办法! 1 ...

  10. 【已解决】python中文字符乱码(GB2312,GBK,GB18030相关的问题)

      http://againinput4.blog.163.com/blog/static/1727994912011111011432810/ [已解决]python中文字符乱码(GB2312,GB ...