Looping on the Command Line

Writing for, while loops is useful when programming but not particularly easy when working interactively on the command line. There are some functions which implement looping to make life easier

lapply: Loop over a list and evaluate a function on each elementsapply: Same as lapply but try to simplify the result

apply: Apply a function over the margins of an array

tapply: Apply a function over subsets of a vector mapply: Multivariate version of lapply

An auxiliary function split is also useful, particularly in conjunction with lapply

lapply

lapply takes three arguments: (1) a list X; (2) a function (or the name of a function) FUN; (3) other arguments via its ... argument. If X is not a list, it will be coerced to a list using as.list.

## function (X, FUN, ...)

## {

## FUN <- match.fun(FUN)

## if (!is.vector(X) || is.object(X))

## X <- as.list(X)

## .Internal(lapply(X, FUN))

## }

## <bytecode: 0x7ff7a1951c00>

## <environment: namespace:base>

The actual looping is done internally in C code.

lapply always returns a list, regardless of the class of the input.

x <- list(a = 1:5, b = rnorm(10))

lapply(x, mean)

x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) lapply(x, mean)

> x <- 1:4 > lapply(x, runif)

lapply and friends make heavy use of anonymous function

> x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))

> x

$a

[,1] [,2]

[1,] 1 3

[2,] 2 4

$b

[,1] [,2]

[1,] 1 4

[2,] 2 5

[3,] 3 6

An anonymous function for extracting the first column of each matrix.

> lapply(x, function(elt) elt[,1])

$a

[1] 1 2

$b

[1] 1 2 3

sapply

> x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))

> lapply(x, mean)

apply

apply is used to a evaluate a function (often an anonymous one) over the margins of an array.

It is most often used to apply a function to the rows or columns of a matrix

It can be used with general arrays, e.g. taking the average of an array of matrices

It is not really faster than writing a loop, but it works in one line!

> str(apply)

function (X, MARGIN, FUN, ...)

X is an array

MARGIN is an integer vector indicating which margins should be “retained”.

FUN is a function to be applied

... is for other arguments to be passed to FUN

> x <- matrix(rnorm(200), 20, 10)

> apply(x, 2, mean)

[1] 0.04868268 0.35743615 -0.09104379

[4] -0.05381370 -0.16552070 -0.18192493

[7] 0.10285727 0.36519270 0.14898850

[10] 0.26767260

col/row sums and means

For sums and means of matrix dimensions, we have some shortcuts.

rowSums = apply(x, 1, sum)

rowMeans = apply(x, 1, mean)

colSums = apply(x, 2, sum)

colMeans = apply(x, 2, mean)

The shortcut functions are much faster, but you won’t notice unless you’re using a large matrix.

Other Ways to Apply

Quantiles of the rows of a matrix.

> x <- matrix(rnorm(200), 20, 10)

> apply(x, 1, quantile, probs = c(0.25, 0.75))

mapply

mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments.

> str(mapply)

function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE,USE.NAMES = TRUE)

FUN is a function to apply ... contains arguments to apply over MoreArgs is a list of other arguments to FUN.

SIMPLIFY indicates whether the result should be simplified

The following is tedious to type

list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))

Instead we can do

Vectorizing a Function

> noise <- function(n, mean, sd) {

+ rnorm(n, mean, sd)

+ }

> noise(5, 1, 2)

[1] 2.4831198 2.4790100 0.4855190 -1.2117759

[5] -0.2743532

> noise(1:5, 1:5, 2)

[1] -4.2128648 -0.3989266 4.2507057 1.1572738

[5] 3.7413584

Instant Vectorization

> mapply(noise, 1:5, 1:5, 2)

Which is the same as

list(noise(1, 1, 2), noise(2, 2, 2), noise(3, 3, 2), noise(4, 4, 2), noise(5, 5, 2))

tapply

tapply is used to apply a function over subsets of a vector. I don’t know why it’s called tapply.

> str(tapply) function (X, INDEX, FUN = NULL, ..., simplify = TRUE)

X is a vector

INDEX is a factor or a list of factors (or else they are coerced to factors)

FUN is a function to be applied

... contains other arguments to be passed FUN

simplify, should we simplify the result?

Take group means.

> x <- c(rnorm(10), runif(10), rnorm(10, 1))

> f <- gl(3, 10)

> f

[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3

[24] 3 3 3 3 3 3 3

Levels: 1 2 3

> tapply(x, f, mean)

1 2 3

0.1144464 0.5163468 1.2463678

Take group means without simplification.

> tapply(x, f, mean, simplify = FALSE)

$‘1‘

[1] 0.1144464

$‘2‘

[1] 0.5163468

$‘3‘

[1] 1.246368

Find group ranges.

> tapply(x, f, range)

$‘1‘

[1] -1.097309 2.694970

$‘2‘

[1] 0.09479023 0.79107293

$‘3‘

[1] 0.4717443 2.5887025

split

split takes a vector or other objects and splits it into groups determined by a factor or list of
factors.

> str(split)
function (x, f, drop = FALSE, ...)

x is a vector (or list) or data frame

f is a factor (or coerced to one) or a list of factors

drop indicates whether empty factors levels should be dropped

A common idiom is split followed by an lapply.

> lapply(split(x, f), mean)

Splitting a Data Frame

> library(datasets)

> head(airquality)

> s <- split(airquality, airquality$Month)

> lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))

> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))

> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))

Splitting on More than One Level

> x <- rnorm(10)

> f1 <- gl(2, 5)

> f2 <- gl(5, 2)

Interactions can create empty levels.

> str(split(x, list(f1, f2)))

split

Empty levels can be dropped

> str(split(x, list(f1, f2), drop = TRUE))

List of 6

$ 1.1: num [1:2] -0.378 0.445

$ 1.2: num [1:2] 1.4066 0.0166

$ 1.3: num -0.355

$ 2.3: num 0.315

$ 2.4: num [1:2] -0.907 0.723

$ 2.5: num [1:2] 0.732 0.360

欢迎关注

R Programming week 3-Loop functions的更多相关文章

  1. Coursera系列-R Programming第二周

    博客总目录,记录学习R与数据分析的一切:http://www.cnblogs.com/weibaar/p/4507801.html  --- 好久没发博客 且容我大吼一句 终于做完这周R Progra ...

  2. Coursera系列-R Programming第三周-词法作用域

    完成R Programming第三周 这周作业有点绕,更多地是通过一个缓存逆矩阵的案例,向我们示范[词法作用域 Lexical Scopping]的功效.但是作业里给出的函数有点绕口,花费了我们蛮多心 ...

  3. 让reddit/r/programming炸锅的一个帖子,还是挺有意思的

    这是原帖 http://www.reddit.com/r/programming/comments/358tnp/five_programming_problems_every_software_en ...

  4. R Programming week2 Functions and Scoping Rules

    A Diversion on Binding Values to Symbol When R tries to bind a value to a symbol,it searches through ...

  5. [R] [Johns Hopkins] R Programming 作業 Week 2 - Air Pollution

    Introduction For this first programming assignment you will write three functions that are meant to ...

  6. R Programming week2 Control Structures

    Control Structures Control structures in R allow you to control the flow of execution of the program ...

  7. R Programming week 3-Debugging

    Something’s Wrong! Indications that something’s not right message: A generic notification/diagnostic ...

  8. R Programming week1-Reading Data

    Reading Data There are a few principal functions reading data into R. read.table, read.csv, for read ...

  9. R Programming week1-Data Type

    Objects R has five basic or “atomic” classes of objects: character numeric (real numbers) integer co ...

随机推荐

  1. VMware一些使用心得

    这段时间VMware workstation用得较多,装了好几个虚拟机,有win2003,win2008,win7,还分32位,64位.装了这么多,要么是用于安装一些软件,比如oracle12c,因为 ...

  2. VS创建Web项目提示配置IISExpress失败

    开发服务器VS2013,新建Web项目提示: 打开Web项目提示: 解决方法:控制面板,找到IISExpress,右键 选择修复,解决问题..

  3. 浅谈UML的概念和模型之UML视图

    相信大家都知道UML的全称,统一建模语言(UML是 Unified Modeling Language的缩写)是用来对软件系统进行可视化建模的一种语言.UML为面向对象开发系统的产品进行说明.可视化. ...

  4. mongo04---基本查询

    核心: mongod: 数据库核心进程 mongos: 查询路由器,集群时用 mongo: 交互终端(客户端) 二进制导出导入: mongodump:导出bson数据 mongorestore: 导入 ...

  5. linux input子系统 — TP A/B(Slot)协议【转】

    本文转载自:http://blog.csdn.net/u012719256/article/details/53609906 将A/B协议这部分单独拿出来说一方面是因为这部分内容是比较容易忽视的,周围 ...

  6. 珠海鼎芯(D-Chip)IMX6读取CPU的UID的方法【转】

    本文转载自:http://blog.csdn.net/williamdedong/article/details/52712084 在使用IMX6板子的时候,有时会想着是否可以把板子搞一个唯一标识呢, ...

  7. lucene DocValues——本质是为通过docID查找某field的值

    什么是docValues? docValues是一种记录doc字段值的一种形式,在例如在结果排序和统计Facet查询时,需要通过docid取字段值的场景下是非常高效的. 为什么要使用docValues ...

  8. Java对象与对象引用变量的理解

    Java对象及对象引用 首先定义一个简单的类: class User{ int userId; String userName; } 我们在创建对象时,通常会写: User user = new Us ...

  9. 解决安装YII2 速度慢 失败等问题

    更改composer镜像地址为    composer config -g repo.packagist composer https://packagist.phpcomposer.com

  10. ThreadLocal工具类 隔离思想

    ThreadLocal不是用来解决共享对象的多线程访问问题的, 通过ThreadLocal的set()方法设置到线程的ThreadLocal.ThreadLocalMap里的是是线程自己要存储的对象, ...