Looping on the Command Line

Writing for, while loops is useful when programming but not particularly easy when working interactively on the command line. There are some functions which implement looping to make life easier

lapply: Loop over a list and evaluate a function on each elementsapply: Same as lapply but try to simplify the result

apply: Apply a function over the margins of an array

tapply: Apply a function over subsets of a vector mapply: Multivariate version of lapply

An auxiliary function split is also useful, particularly in conjunction with lapply

lapply

lapply takes three arguments: (1) a list X; (2) a function (or the name of a function) FUN; (3) other arguments via its ... argument. If X is not a list, it will be coerced to a list using as.list.

## function (X, FUN, ...)

## {

## FUN <- match.fun(FUN)

## if (!is.vector(X) || is.object(X))

## X <- as.list(X)

## .Internal(lapply(X, FUN))

## }

## <bytecode: 0x7ff7a1951c00>

## <environment: namespace:base>

The actual looping is done internally in C code.

lapply always returns a list, regardless of the class of the input.

x <- list(a = 1:5, b = rnorm(10))

lapply(x, mean)

x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) lapply(x, mean)

> x <- 1:4 > lapply(x, runif)

lapply and friends make heavy use of anonymous function

> x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))

> x

$a

[,1] [,2]

[1,] 1 3

[2,] 2 4

$b

[,1] [,2]

[1,] 1 4

[2,] 2 5

[3,] 3 6

An anonymous function for extracting the first column of each matrix.

> lapply(x, function(elt) elt[,1])

$a

[1] 1 2

$b

[1] 1 2 3

sapply

> x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))

> lapply(x, mean)

apply

apply is used to a evaluate a function (often an anonymous one) over the margins of an array.

It is most often used to apply a function to the rows or columns of a matrix

It can be used with general arrays, e.g. taking the average of an array of matrices

It is not really faster than writing a loop, but it works in one line!

> str(apply)

function (X, MARGIN, FUN, ...)

X is an array

MARGIN is an integer vector indicating which margins should be “retained”.

FUN is a function to be applied

... is for other arguments to be passed to FUN

> x <- matrix(rnorm(200), 20, 10)

> apply(x, 2, mean)

[1] 0.04868268 0.35743615 -0.09104379

[4] -0.05381370 -0.16552070 -0.18192493

[7] 0.10285727 0.36519270 0.14898850

[10] 0.26767260

col/row sums and means

For sums and means of matrix dimensions, we have some shortcuts.

rowSums = apply(x, 1, sum)

rowMeans = apply(x, 1, mean)

colSums = apply(x, 2, sum)

colMeans = apply(x, 2, mean)

The shortcut functions are much faster, but you won’t notice unless you’re using a large matrix.

Other Ways to Apply

Quantiles of the rows of a matrix.

> x <- matrix(rnorm(200), 20, 10)

> apply(x, 1, quantile, probs = c(0.25, 0.75))

mapply

mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments.

> str(mapply)

function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE,USE.NAMES = TRUE)

FUN is a function to apply ... contains arguments to apply over MoreArgs is a list of other arguments to FUN.

SIMPLIFY indicates whether the result should be simplified

The following is tedious to type

list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))

Instead we can do

Vectorizing a Function

> noise <- function(n, mean, sd) {

+ rnorm(n, mean, sd)

+ }

> noise(5, 1, 2)

[1] 2.4831198 2.4790100 0.4855190 -1.2117759

[5] -0.2743532

> noise(1:5, 1:5, 2)

[1] -4.2128648 -0.3989266 4.2507057 1.1572738

[5] 3.7413584

Instant Vectorization

> mapply(noise, 1:5, 1:5, 2)

Which is the same as

list(noise(1, 1, 2), noise(2, 2, 2), noise(3, 3, 2), noise(4, 4, 2), noise(5, 5, 2))

tapply

tapply is used to apply a function over subsets of a vector. I don’t know why it’s called tapply.

> str(tapply) function (X, INDEX, FUN = NULL, ..., simplify = TRUE)

X is a vector

INDEX is a factor or a list of factors (or else they are coerced to factors)

FUN is a function to be applied

... contains other arguments to be passed FUN

simplify, should we simplify the result?

Take group means.

> x <- c(rnorm(10), runif(10), rnorm(10, 1))

> f <- gl(3, 10)

> f

[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3

[24] 3 3 3 3 3 3 3

Levels: 1 2 3

> tapply(x, f, mean)

1 2 3

0.1144464 0.5163468 1.2463678

Take group means without simplification.

> tapply(x, f, mean, simplify = FALSE)

$‘1‘

[1] 0.1144464

$‘2‘

[1] 0.5163468

$‘3‘

[1] 1.246368

Find group ranges.

> tapply(x, f, range)

$‘1‘

[1] -1.097309 2.694970

$‘2‘

[1] 0.09479023 0.79107293

$‘3‘

[1] 0.4717443 2.5887025

split

split takes a vector or other objects and splits it into groups determined by a factor or list of
factors.

> str(split)
function (x, f, drop = FALSE, ...)

x is a vector (or list) or data frame

f is a factor (or coerced to one) or a list of factors

drop indicates whether empty factors levels should be dropped

A common idiom is split followed by an lapply.

> lapply(split(x, f), mean)

Splitting a Data Frame

> library(datasets)

> head(airquality)

> s <- split(airquality, airquality$Month)

> lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))

> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))

> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))

Splitting on More than One Level

> x <- rnorm(10)

> f1 <- gl(2, 5)

> f2 <- gl(5, 2)

Interactions can create empty levels.

> str(split(x, list(f1, f2)))

split

Empty levels can be dropped

> str(split(x, list(f1, f2), drop = TRUE))

List of 6

$ 1.1: num [1:2] -0.378 0.445

$ 1.2: num [1:2] 1.4066 0.0166

$ 1.3: num -0.355

$ 2.3: num 0.315

$ 2.4: num [1:2] -0.907 0.723

$ 2.5: num [1:2] 0.732 0.360

欢迎关注

R Programming week 3-Loop functions的更多相关文章

  1. Coursera系列-R Programming第二周

    博客总目录,记录学习R与数据分析的一切:http://www.cnblogs.com/weibaar/p/4507801.html  --- 好久没发博客 且容我大吼一句 终于做完这周R Progra ...

  2. Coursera系列-R Programming第三周-词法作用域

    完成R Programming第三周 这周作业有点绕,更多地是通过一个缓存逆矩阵的案例,向我们示范[词法作用域 Lexical Scopping]的功效.但是作业里给出的函数有点绕口,花费了我们蛮多心 ...

  3. 让reddit/r/programming炸锅的一个帖子,还是挺有意思的

    这是原帖 http://www.reddit.com/r/programming/comments/358tnp/five_programming_problems_every_software_en ...

  4. R Programming week2 Functions and Scoping Rules

    A Diversion on Binding Values to Symbol When R tries to bind a value to a symbol,it searches through ...

  5. [R] [Johns Hopkins] R Programming 作業 Week 2 - Air Pollution

    Introduction For this first programming assignment you will write three functions that are meant to ...

  6. R Programming week2 Control Structures

    Control Structures Control structures in R allow you to control the flow of execution of the program ...

  7. R Programming week 3-Debugging

    Something’s Wrong! Indications that something’s not right message: A generic notification/diagnostic ...

  8. R Programming week1-Reading Data

    Reading Data There are a few principal functions reading data into R. read.table, read.csv, for read ...

  9. R Programming week1-Data Type

    Objects R has five basic or “atomic” classes of objects: character numeric (real numbers) integer co ...

随机推荐

  1. 用bis和bic实现位级操作

    20世纪70年代末至80年代末,DigitalEquipment的VAX计算机是一种非常流行的机型.它没有布尔运算AND和OR指令,仅仅有bis(位设置)和bic(位清除)这两种指令.两种指令的输入都 ...

  2. 关于前端js拼接字符串的一点小经验

    1.今天在做项目的时候遇到一个问题,就是使用onclick="xxx()"  点击事件的时候,参数如果为全数字就会出现点击无反应的问题.但是当参数为字符串或者动态内容的时候就会出现 ...

  3. 通过fsharp 使用Enterprise Library Unity 3 - 三种拦截模式的探索

    这篇就三种拦截模式进行一下探索. 特性总结   类型 特点 其它 InterfaceInterceptor Innstance 仅单接口 类内部函数互相引用无法引起拦截行为 TransparentPr ...

  4. 5.2【Linux 内核网络协议栈源码剖析】socket 函数剖析 ☆☆☆

    深度剖析网络协议栈中的 socket 函数,可以说是把前面介绍的串联起来,将网络协议栈各层关联起来. 应用层 FTP SMTP HTTP ... 传输层 TCP UDP 网络层 IP ICMP ARP ...

  5. Ubuntu上配置Eclipse:安装CDT【转】

    本文转载自:http://www.linuxdiyf.com/linux/23519.html 在最新的 Ubuntu Kylin 16.04 中安装了eclipse,在纠结了很久的网络问题之后,开始 ...

  6. linux进程编程入门

    1.进程的创建与操作 任务描述: 在父进程中创建一个全局变量,一个局部变量,并赋予初始值,用fork函数创建子进程.在子进程中对父进程的变量进行自加操作,并且输出变量值,然后父进程睡眠一段时间 各进程 ...

  7. ELF和a.out文件格式的比较

    本文讨论了 UNIX/LINUX 平台下三种主要的可执行文件格式:a.out(assembler and link editor output 汇编器和链接编辑器的输出).COFF(Common Ob ...

  8. YTU 2845: 编程题AB-卡片游戏

    2845: 编程题AB-卡片游戏 时间限制: 1 Sec  内存限制: 128 MB 提交: 30  解决: 13 题目描述 小明对数字的序列产生了兴趣: 现有许多张不同的数字卡片,用这若干张卡片能排 ...

  9. 什么叫强类型的DATASET ?对DATASET的操作处理?强类型DataSet的使用简明教程

    强类型DataSet,是指需要预先定义对应表的各个字段的属性和取值方式的数据集.对于所有这些属性都需要从DataSet, DataTable, DataRow继承,生成相应的用户自定义类.强类型的一个 ...

  10. POJ - 3468 A Simple Problem with Integers(线段树区间更新,区间查询)

    1.给出了一个序列,你需要处理如下两种询问. "C a b c"表示给[a, b]区间中的值全部增加c (-10000 ≤ c ≤ 10000). "Q a b" ...