R Programming week 3-Loop functions
Looping on the Command Line
Writing for, while loops is useful when programming but not particularly easy when working interactively on the command line. There are some functions which implement looping to make life easier
lapply: Loop over a list and evaluate a function on each elementsapply: Same as lapply but try to simplify the result
apply: Apply a function over the margins of an array
tapply: Apply a function over subsets of a vector mapply: Multivariate version of lapply
An auxiliary function split is also useful, particularly in conjunction with lapply
lapply
lapply takes three arguments: (1) a list X; (2) a function (or the name of a function) FUN; (3) other arguments via its ... argument. If X is not a list, it will be coerced to a list using as.list.
## function (X, FUN, ...)
## {
## FUN <- match.fun(FUN)
## if (!is.vector(X) || is.object(X))
## X <- as.list(X)
## .Internal(lapply(X, FUN))
## }
## <bytecode: 0x7ff7a1951c00>
## <environment: namespace:base>
The actual looping is done internally in C code.
lapply always returns a list, regardless of the class of the input.
x <- list(a = 1:5, b = rnorm(10))
lapply(x, mean)
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) lapply(x, mean)
> x <- 1:4 > lapply(x, runif)
lapply and friends make heavy use of anonymous function
> x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))
> x
$a
[,1] [,2]
[1,] 1 3
[2,] 2 4
$b
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
An anonymous function for extracting the first column of each matrix.
> lapply(x, function(elt) elt[,1])
$a
[1] 1 2
$b
[1] 1 2 3
sapply
> x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
> lapply(x, mean)
apply
apply is used to a evaluate a function (often an anonymous one) over the margins of an array.
It is most often used to apply a function to the rows or columns of a matrix
It can be used with general arrays, e.g. taking the average of an array of matrices
It is not really faster than writing a loop, but it works in one line!
> str(apply)
function (X, MARGIN, FUN, ...)
X is an array
MARGIN is an integer vector indicating which margins should be “retained”.
FUN is a function to be applied
... is for other arguments to be passed to FUN
> x <- matrix(rnorm(200), 20, 10)
> apply(x, 2, mean)
[1] 0.04868268 0.35743615 -0.09104379
[4] -0.05381370 -0.16552070 -0.18192493
[7] 0.10285727 0.36519270 0.14898850
[10] 0.26767260
col/row sums and means
For sums and means of matrix dimensions, we have some shortcuts.
rowSums = apply(x, 1, sum)
rowMeans = apply(x, 1, mean)
colSums = apply(x, 2, sum)
colMeans = apply(x, 2, mean)
The shortcut functions are much faster, but you won’t notice unless you’re using a large matrix.
Other Ways to Apply
Quantiles of the rows of a matrix.
> x <- matrix(rnorm(200), 20, 10)
> apply(x, 1, quantile, probs = c(0.25, 0.75))
mapply
mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments.
> str(mapply)
function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE,USE.NAMES = TRUE)
FUN is a function to apply ... contains arguments to apply over MoreArgs is a list of other arguments to FUN.
SIMPLIFY indicates whether the result should be simplified
The following is tedious to type
list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))
Instead we can do
Vectorizing a Function
> noise <- function(n, mean, sd) {
+ rnorm(n, mean, sd)
+ }
> noise(5, 1, 2)
[1] 2.4831198 2.4790100 0.4855190 -1.2117759
[5] -0.2743532
> noise(1:5, 1:5, 2)
[1] -4.2128648 -0.3989266 4.2507057 1.1572738
[5] 3.7413584
Instant Vectorization
> mapply(noise, 1:5, 1:5, 2)
Which is the same as
list(noise(1, 1, 2), noise(2, 2, 2), noise(3, 3, 2), noise(4, 4, 2), noise(5, 5, 2))
tapply
tapply is used to apply a function over subsets of a vector. I don’t know why it’s called tapply.
> str(tapply) function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
X is a vector
INDEX is a factor or a list of factors (or else they are coerced to factors)
FUN is a function to be applied
... contains other arguments to be passed FUN
simplify, should we simplify the result?
Take group means.
> x <- c(rnorm(10), runif(10), rnorm(10, 1))
> f <- gl(3, 10)
> f
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3
[24] 3 3 3 3 3 3 3
Levels: 1 2 3
> tapply(x, f, mean)
1 2 3
0.1144464 0.5163468 1.2463678
Take group means without simplification.
> tapply(x, f, mean, simplify = FALSE)
$‘1‘
[1] 0.1144464
$‘2‘
[1] 0.5163468
$‘3‘
[1] 1.246368
Find group ranges.
> tapply(x, f, range)
$‘1‘
[1] -1.097309 2.694970
$‘2‘
[1] 0.09479023 0.79107293
$‘3‘
[1] 0.4717443 2.5887025
split
split takes a vector or other objects and splits it into groups determined by a factor or list of
factors.
> str(split)
function (x, f, drop = FALSE, ...)
x is a vector (or list) or data frame
f is a factor (or coerced to one) or a list of factors
drop indicates whether empty factors levels should be dropped
A common idiom is split followed by an lapply.
> lapply(split(x, f), mean)
Splitting a Data Frame
> library(datasets)
> head(airquality)
> s <- split(airquality, airquality$Month)
> lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
> sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))
Splitting on More than One Level
> x <- rnorm(10)
> f1 <- gl(2, 5)
> f2 <- gl(5, 2)
Interactions can create empty levels.
> str(split(x, list(f1, f2)))
split
Empty levels can be dropped
> str(split(x, list(f1, f2), drop = TRUE))
List of 6
$ 1.1: num [1:2] -0.378 0.445
$ 1.2: num [1:2] 1.4066 0.0166
$ 1.3: num -0.355
$ 2.3: num 0.315
$ 2.4: num [1:2] -0.907 0.723
$ 2.5: num [1:2] 0.732 0.360
欢迎关注

R Programming week 3-Loop functions的更多相关文章
- Coursera系列-R Programming第二周
博客总目录,记录学习R与数据分析的一切:http://www.cnblogs.com/weibaar/p/4507801.html --- 好久没发博客 且容我大吼一句 终于做完这周R Progra ...
- Coursera系列-R Programming第三周-词法作用域
完成R Programming第三周 这周作业有点绕,更多地是通过一个缓存逆矩阵的案例,向我们示范[词法作用域 Lexical Scopping]的功效.但是作业里给出的函数有点绕口,花费了我们蛮多心 ...
- 让reddit/r/programming炸锅的一个帖子,还是挺有意思的
这是原帖 http://www.reddit.com/r/programming/comments/358tnp/five_programming_problems_every_software_en ...
- R Programming week2 Functions and Scoping Rules
A Diversion on Binding Values to Symbol When R tries to bind a value to a symbol,it searches through ...
- [R] [Johns Hopkins] R Programming 作業 Week 2 - Air Pollution
Introduction For this first programming assignment you will write three functions that are meant to ...
- R Programming week2 Control Structures
Control Structures Control structures in R allow you to control the flow of execution of the program ...
- R Programming week 3-Debugging
Something’s Wrong! Indications that something’s not right message: A generic notification/diagnostic ...
- R Programming week1-Reading Data
Reading Data There are a few principal functions reading data into R. read.table, read.csv, for read ...
- R Programming week1-Data Type
Objects R has five basic or “atomic” classes of objects: character numeric (real numbers) integer co ...
随机推荐
- 【bzoj2753】[SCOI2012]滑雪与时间胶囊
#include<algorithm> #include<iostream> #include<cstdlib> #include<cstring> # ...
- 借助ltp 逐步程序化实现规则库 文本生成引擎基于规则库和业务词库 去生成文本
[哪个地方做什么的哪家靠谱?地名词库行业.业务词库]苏州做网络推广的公司哪家靠谱?苏州镭射机维修哪家最专业?昆山做账的公司哪家比较好广州称重灌装机生产厂家哪家口碑比较好 [含有专家知识]郑州律师哪个好 ...
- atom及其插件activate-power-mode下载安装
Atom是Github推出的一个文本编辑器,其中包含很多插件可以自行下载安装,其中一个最近比较火的就是插件activate-power-mode,可以实现打字屏振效果, 打字带特效哦,所以最近就尝试安 ...
- 【idea】idea快捷键
Alt+回车 导入包,自动修正 alt+shift+↑ 向上sout输出 psvm主函数 fori for Ctrl+N 查找类Ctrl+Shift+N 查找文件Ctrl+Alt+L 格式化代 ...
- [NOI 2012] 美食节
[题目链接] https://www.lydsy.com/JudgeOnline/problem.php?id=2879 [算法] 首先 , 将每种食物建一个点 , 将每位厨师做的每一道菜建一个点 建 ...
- 协程的优点(Python)
协程的优点: 协程是进程和线程的升级版,进程和线程都面临着内核态和用户态的切换问题而耗费许多切换时间, 而协程就是用户自己控制切换的时机,不再需要陷入系统的内核态.协程的执行效率非常高.因为子程序切换 ...
- Oracle VM VirtualBox启动新建虚拟机弹错--不能为虚拟机xxxx电脑 打开一个新任务
有三种方案: 1.先在任务管理器中关掉所有virtualBox的进程,然后进入到C:\Users\Administrator\VirtualBox VMs\ 将相应guest的文件夹随便改个名字,再重 ...
- bzoj2916
容斥原理 计蒜客比赛day2t3的简化版 总数-异色三角形 对于每个点考虑,每个点红线数量为d[i],那么以这个点为顶点的异色三角形有d[i]*(n-1-d[i]),每条红线和蓝线成一个异色三角形,一 ...
- bzoj 1922: [Sdoi2010]大陆争霸【dijskstra】
d[u]为u被几个节点保护,d1[u]为最早到u的时间,d2[u]为u的最早可进入时间(保护点都被打下来了的时候),然后最终最早进入时间就是max(d1[u],d2[u]),把这个作为权值放进小根堆, ...
- bzoj 4037: [HAOI2015]数字串拆分【dp+矩阵加速】
首先f长得就很像能矩阵优化的,先构造转移矩阵(这里有一点神奇的地方,我看网上的blog和我构造的矩阵完全不一样还以为我的构造能力又丧失了,后来惊奇的发现我把那篇blog里的构造矩阵部分换成我的构造方式 ...