Non-standard evaluation, how tidy eval builds on base R
As with many aspects of the tidyverse, its non-standard evaluation (NSE) implementation is not something entirely new, but built on top of base R. What makes this one so challenging to get your mind around, is that the Honorable Doctor Sir Lord General and friends brought concepts to the realm of the mortals that many of us had no, or only a vague, understanding of. Earlier, I gave an overview of the most common actions in tidy eval. Although appreciated by many, it left me unsatisfied, because it made clear to me I did not really understand NSE. Neither in base R, nor in tidy eval. Therefore, I bit the bullet and really studied it for a few evenings. Starting with base R NSE, and later learning what tidy eval actually adds to it. I decided to share the things I learned in this, rather lengthy, blog. I think it captures the essentials in NSE, although it surely is incomplete and might be even erronous at places. Still, I hope you find it worthwhile and it will help you understand NSE better and apply it with more confidence.
My approach was listing a number of terms and study them one by one. Mainly consultingAdvanced R and the R Language Definition. For tidy eval I leaned heavily on the Programming with dplyr vignette and the function documentations. This is also how this blog post is built. We are hopping from term to term, to we see how they relate. You will find references to the sources in the text, in case you want to read more about a topic.
Base R, non-standard evaluation
expression
In standard evaluation R is like a child that receives candy from his grandmother and puts it in his mouth immediately. Every input is evaluated right away. This can be collecting the value of an object or letting a function do a calculation. An expression is some R code that is ready to be evaluated, but is not evaluated yet. Rather it is captured and saved for later. Think of it as the child’s father telling he can’t have the candy until they get home. The base R way of creating an expression is by using parse(text = "<string input>").
library(tidyverse)
expr <- parse(text = "5 + 5")
expr
## expression(5 + 5)
Note that the text argument is not the first argument of parse() and thus must be named. To evaluate expression we run eval() on the expression.
eval(expr)
## [1] 10
When giving multiple lines to parse() it will create a list-like object of multiple expressions.
multi_exp <- parse(text = c(
"x
3 + 3"
))
multi_exp %>% class()
## [1] "expression"
multi_exp %>% length()
## [1] 2
multi_exp[[1]]
## x
multi_exp[[1]] %>% class()
## [1] "name"
Strange, the first element of the expression is not an expression, but a name. What’s up with that?
name
When creating an object in R, you are binding the name of the object to a value. This binding of value and name is done in an environment. In the following, the name x gets associated with the value 50. Since we did not create a specific environment, this is a binding in the global environment.
x <- 50
Normally, we give R the object name to retrieve the corresponding value. However, when we save the objects name as an expression, it is not evaluated but stored as name object. (Confusingly, when the expression of length 1, it is of class expression instead of classname). So name is a subclass of expression, it is created when the unevaluated R code will retrieve the value of an object once evaluated.
exp_x <- parse(text = "x")
eval(exp_x)
## [1] 50
This way we can build a request for later. The variable requested for doesn’t even have to exist at creation time. (Like granny having no candy herself, but telling the kid that he can have candy when he gets back at his parent’s place).
eval_me <- parse(text = "y")
eval(eval_me)
## Error in eval(eval_me): object 'y' not found
y <- "I am ready"
eval(eval_me)
## [1] "I am ready"
Now you might wonder, in the tidyverse packages I can conveniently pass bare object names, there is no need to provide strings. This is possible in base R too, with the function quote(). It quotes its input, which is capturing the R code as provided.
quote(x)
## x
quote(x) %>% class()
## [1] "name"
If we want to quickly create a name from a string, instead of running parse() we can also use as.name.
as.name("x")
## x
And finally to make matters nice and unclear, a name is also called a symbol and the functionas.symbol() does the same as as.name(). Perfect, we have a good idea about quoting variable names and how to retrieve their value later. Now, lets call some functions.
call
When we delaying the evaluation of a function call, we arrive at the second subcategory of expressions: the call. The function to be called, with the names of the objects used for the arguments, are stored until further notice.
wait_for_it <- quote(x + y)
class(wait_for_it)
## [1] "call"
x <- 3; y <- 8
eval(wait_for_it)
## [1] 11
Note that + is a function, like every action that happens in R. We have already seen that from a string we can get to an expression with parse(). Not surprisingly, deparse() returns the expression as a string. This allows us to do stuff like:
print_func <- function(expr){
paste("The value of", deparse(expr), "is", eval(expr))
}
print_func(wait_for_it)
## [1] "The value of x + y is 11"
print_func(quote(log(42) %>% round(1)))
## [1] "The value of log(42) %>% round(1) is 3.7"
print_func(quote(x))
## [1] "The value of x is 3"
When the expression is a name, we print the name and the value of the object associated with the name. When it is a call, we print the function call and the evaluation of it.
environment and closure
In the last block we used a function in which we applied NSE. No coincidence, NSE and functions are a strong and natural pair. With NSE we can create powerful and user-friendly functions, like the ones in ggplot2 and dplyr. We need to elaborate on environments andclosures here. I told you that an object is the binding of a name and a value in an environment. When starting an R session, you are in the global environment Adv-R. All objects created live happily in the global.
z <- 25
A function creates a new environment, objects of the same name as objects in the global can live here with different values bound to them.
z_func <- function() {
z <- 12
z
}
z_func()
## [1] 12
z
## [1] 25
The z_func did not change the global environment, but created an object in its own environment. Now functions are of a type called a closure.
typeof(z_func)
## [1] "closure"
They are called this way because they enclose their environment. At creation they have a look around in the environment in which they are created and capture all the names and values that are available there. They don’t just know the names of the objects in their own environment, but also in the environment in which they were created Adv-R.
Keep the concept of a closure in mind, we will revisit it.
substitute and promise
With the knowledge gained in the above we can start and try to write our own NSE functions. Lets make a function that adds a column to a data frame that is the square of a column that is already in it.
add_squared <- function(x, col_name) {
new_colname <- paste0(deparse(col_name), "_sq")
x[, new_colname] <- x[ ,deparse(col_name)]^2
x
}
add_squared(mtcars, quote(cyl)) %>% head(1)
## mpg cyl disp hp drat wt qsec vs am gear carb cyl_sq
## Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4 36
You might say, “that is not too convenient, I still need to quote the col_name myself”. Well, you are very right, it would be more helpful if the function did the quoting for you. Unfortunately placing quote(col_name) inside the function body is of no use. quote() makes a literal quote of its input. So it would make the name col_name here each time it was called, no matter the value that was given to the argument. Rather than quoting the value that was provided to this argument.
Here we need substitute(). This will lookup all the object names provided to it, and if it finds a value for that name, it will substitute the name for its value Adv-R. Lets do a filter function to demonstrate.
my_filt <- function(x, filt_cond) {
filt_cond_q <- substitute(filt_cond)
rows_to_keep <- eval(filt_cond_q, x)
x[rows_to_keep, ]
}
my_filt(mtcars, mpg == 21)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Yeah, that works. But, wait a minute. How does eval() now know that mpg is a column inx? We provided x to the eval function, but how does this work? Well, the data frame xwas provided to the envir argument of eval(). A data frame, thus, is a valid environment in which we can evaluate expressions. mpg lives in x, so the evaluation of filt_cond_q here gives the desired result.
When you think about it a little longer, NSE is only possible when function arguments are not evaluated directly. If the function was the inpatient kid that wanted to put filt_cond in its mouth right away, it would have failed to find an object with the name mpg in the global environment. When the function is called, a provided arguments is stored in a promise. The promise of the argument contains the value of the argument, but also an expression of the argument. The function does not bother about the value of the promise, until the function argument is actually used in the function. The substitute() function does only enter the expression part of the promise. In the my_filt() example, the promise associated with thex argument will have the actual data frame belonging to the object mtcars as its value, and the name mtcars as its expression. In the second and third line of the function, the value of this argument is accessed. The promise associated with the filt_cond argument, however. does not have a value. But it does have a call as its expression. As soon as we use this argument, the function would fail. But we don’t. With substitute() we only access the expression of the promise R lang.
formula
Before we move to tidy eval there is one more concept we have to elaborate on, the formula. Probably you have used formulas a lot, but did you ever think about how odd they are? Take the following example
mod <- lm(vs ~ mpg + cyl, data = mtcars)
No R user would have trouble reading the above, but picture yourself coming from another programming language and stumbling upon it. It as an example of a domain specific language (DSL). DSLs exploit R’s NSE possibilities by giving alternative meaning to the language in specific contexts. Other examples are ggplot2 and dplyr. Just like functions, do formulas enclose the environment they are created in. Meaning that when the formula is evaluated later in a different environment, it can still access all the object that lived in its original environment.
These are, to my understanding, the core elements of NSE in base R. If you don’t care about tidy eval you can stop reading here and try to build your own NSE functions. Thanks for making it this far.
tidy evaluation
There are two key additions of tidy eval to base R NSE. It uses quasiquotation and it introduces a new type of quoted object, called a quosure. Let’s find out about them one by one.
quasiquotation
We now know that in normal quotation the expression is captured to be evaluated later, rather than swallowed right away. Quasiquotation enables the user to swallow parts of the expression right away, while quoting the rest. Let’s find out with an example. We can quote the following simple function.
quote(z - x + 4)
## z - x + 4
Say we know the value of x already at the moment of quoting. How can we let the second part to be evaluated right away and quote z - the result of this evaluation? In other words how do we unquote the x + 4 part? In base R this is not going to happen, but with tidy eval this can be done.
x <- 4
rlang::expr(z - !!x + 4)
## z - 8
rlang::expr(z - !!x + 4) %>% class()
## [1] "call"
Everything after the !! (bang bang) is unquoted. If we do not use unquoting, there is no reason to use rlang::expr() instead of quote(). They have the exact same result. There is also a tidy eval equivalent for substitute(), namely enexpr().
Now the appeal of functions that have implemented quasiquotation is that all the advantages of easy-to-use NSE interfaces remain. At the same time they enable the user to pack the functions that already quote, in custom-made wrappers. Example please! Something I do often is creating a frequency table of the values of a variable in a data frame. I want this in a function with the data frame and column name as arguments. Wrapping dplyr functions in the following way:
freq_table <- function(x, col) {
col_q <- rlang::enexpr(col)
total_n <- x %>% nrow()
x %>% group_by(!!col_q) %>% summarise(freq = n() / total_n)
}
mtcars %>% freq_table(cyl)
## # A tibble: 3 x 2
## cyl freq
## <dbl> <dbl>
## 1 4 0.34375
## 2 6 0.21875
## 3 8 0.43750
mtcars %>% freq_table(vs)
## # A tibble: 2 x 2
## vs freq
## <dbl> <dbl>
## 1 0 0.5625
## 2 1 0.4375
So the functions that use tidy eval, like those in dplyr, automatically quote their input. That is what enables you to type away and get results as quickly as you can when doing data analysis. However if you want to write programs around them you have to take care of two steps. First, quote the argument that is going to be evaluated by the functions used. If we don’t do this our wrapper function would fail because we have provided a name or call that cannot be found in the environment the function is called from. Second, since the dplyr functions quote their input themselves, we have to unquote the quoted arguments in these functions. If we don’t do this the dplyr function will quote the variable name rather than its content.
quosure
Very nice, that quaisquoting. Now what’s up with quosures? From their name you might guess they are hybrids of quotes and closures. We have seen that combination before when we looked at formulas. But formulas are not expressions, they are a DSL that is created through NSE. If we look at quosures, we will see that they behave both like expressions and as formulas.
quo(z) %>% class()
## [1] "quosure" "formula"
quo(z) %>% rlang::is_expr()
## [1] TRUE
Quosures are one-sided fomulas, capturing their environment, but not indicating a modelling relationship. By the way, we’ve seen the quo() function in action. This literally quotes its input, just like quote() and rlang::expr() do. The quosure equivalent of substitute()and enexpr() is enquo().
Just like names, calls can be converted to a quosure too.
quo(2 + 2) %>% class()
## [1] "quosure" "formula"
Note that quosures don’t make a lower level distinction between calls and names. Every expression becomes a quosure.
But when is this capturing of the environment actually useful? When the quosure is created in one environment and evaluated in another. This typically happens when they are created in a function and evaluated in the global environment or another function.
In base R NSE a function can evaluate a quoted argument, it can quote a bare statement, it can even return an expression. What it cannot do however, is giving the expression memory of the variables that were present at creation.
base_NSE_example <- function(some_arg) {
some_var <- 10
quote(some_var + some_arg)
}
base_NSE_example(4) %>% eval()
## Error in eval(.): object 'some_var' not found
The quosure is not memoryless, it will retrieve the values that were present at creation.
tidy_eval_example <- function(some_arg) {
some_var <- 10
quo(some_var + some_arg)
}
tidy_eval_example(4) %>% rlang::eval_tidy()
## [1] 14
Note that we do need to apply eval_tidy() instead of eval() to make use of the memory of the quosure.
How do base R NSE and tidy eval play together?
So tidy eval is build on top of base R NSE and the two can even work together. We have seen that in quasiquotation the parts to be unquoted don’t have to be quosures, we can also unquote base objects like calls and names.
using_base_r_in_tidy_eval <- function(x, col) {
col_q <- substitute(col)
x %>% select(!!col_q)
}
mtcars %>% using_base_r_in_tidy_eval(cyl) %>% head(1)
## cyl
## Mazda RX4 6
If want to use the quasiquotation of tidy eval, but prefer base R quotation, you can combine the two. It does not work the other way around. Since quosures are a new kid on the block,eval() does not know how to unquote them and will throw an error. Familiar expression objects created with tidy eval can be evaluated with eval(), since the objects do not differ from the ones created with base R functions.
all.equal(quote(some_name), rlang::expr(some_name))
## [1] TRUE
all.equal(quote(x + 5), rlang::expr(x+ 5))
## [1] TRUE
The only difference between these functions is on capture, objects after capture are of base types.
Thank You
I took you along my NSE learning path, thank you for making it all the way through. If there is anything you think is incomplete or incorrect, let me know! This document is a living thing. You would do me and everybody who uses it as a reference a great favor by correcting it. The blog is maintained here, do a PR or send an email.
转自:
Non-standard evaluation, how tidy eval builds on base R的更多相关文章
- 全局作用域 eval
eval是在caller的作用域里运行传给它的代码: var x = 'outer'; (function() { var x = 'inner'; eval('x'); // & ...
- eval的对于验证数学公式的用处
var a=10,b=20; var s=a+b+((a/b)+(a+(a-b)))+(11)/a; var r=eval(s); console.log(r); 只要不报错,说明公式正确, 报错公式 ...
- AWR Report 关键参数详细分析
WORKLOAD REPOSITORY report for DB Name DB Id Instance Inst num Startup Time Release RAC CALLDB 12510 ...
- Grokking PyTorch
原文地址:https://github.com/Kaixhin/grokking-pytorch PyTorch is a flexible deep learning framework that ...
- go modules 学习
go modules 学习 tags:golang 安装 只需要golang的版本是1.11及之后的,这个模块就内置好了 环境变量 (1) 配置GoLang的GOROOT (2) 可以不配置GoLan ...
- (转载)PyTorch代码规范最佳实践和样式指南
A PyTorch Tools, best practices & Styleguide 中文版:PyTorch代码规范最佳实践和样式指南 This is not an official st ...
- Defining Go Modules
research!rsc: Go & Versioning https://research.swtch.com/vgo shawn@a:~/gokit/tmp$ go get --helpu ...
- 自然语言18_Named-entity recognition
https://en.wikipedia.org/wiki/Named-entity_recognition http://book.51cto.com/art/201107/276852.htm 命 ...
- 【JAVA】通过公式字符串表达式计算值,网上的一种方法
public class Test { public static void main(String[] args) { SimpleCalculator s=new SimpleCal ...
随机推荐
- Flask 5 模板1
NOTE 1.VF的作用:生成请求的响应.一般来说请求会改变程序的状态,这种变化会在视图函数中产生. eg.用户在网站中注册了一个新账户,用户在表单中输入电子邮件地址和密码,然后提交到服务器,服务器接 ...
- NOI导刊2009 提高一
zzh大佬给我说导刊的题全是普及难度,然而我..觉得有两道题是提高的 LocalMaxima 题目解析 对于\(i\)这个数,它要想成为LocalMaxima,比它大的要全部放到最后去,比它小的想怎么 ...
- 接口测试实例(Road)
以getObjectByCode接口为例,用jmeter2.13来进行接口测试. 测试前准备: 测试工具及版本:jmeter 2.13 r1665067(须包含__MD5函数) 示例接口:8.1根据单 ...
- Docker常用命令汇总,和常用操作举例
Docker命令 docker 常用命令如下 管理命令: container 管理容器 image 管理镜像 network 管理网络 node 管理Swarm节点 plugin 管理插件 secre ...
- render:h => h(App) 是什么意思?
在学习vue.js时,使用vue-cli创建了一个vue项目,main.js文件中有一行代码不知道什么意思.在网上搜索得到如下解答: 参考一:https://www.cnblogs.com/longy ...
- C# winform实现右下角弹出窗口结果的方法
using System.Runtime.InteropServices; [DllImport("user32")] private static extern bool Ani ...
- bzoj1044: [HAOI2008]木棍分割 二分+dp
有n根木棍, 第i根木棍的长度为Li,n根木棍依次连结了一起, 总共有n-1个连接处. 现在允许你最多砍断m个连接处, 砍完后n根木棍被分成了很多段,要求满足总长度最大的一段长度最小, 并且输出有多少 ...
- centos7(debian,manjora,freebsd)命令及安装mysql、git、gpg、gogs,安装docker,zsh,chrome
最小安装: 1. 选择English 2. DATE & TIME 修改好本地时间 SOFTWARE SELECTION默认的Minimal Install就好 INSTALLATION DE ...
- 9.2 Zynq嵌入式系统调试方法
陆佳华书<嵌入式系统软硬件协同设计实战指南 第2版>这本书中的实例着实浪费了我不少时间.从本书第一个实例我就碰了一鼻子灰.当然显然是自己时新手的原因.首先第一个实验其实真的特别简单,为什么 ...
- 本地Jmeter脚本部署在Jenkins上 - Windows
一.下载并安装Jenkins(不进行特别的说明) 二.准备好jmeter脚本 三.插件准备:Publish HTML reports 四.开始 1.登录Jenkins后,点击新建任务 2.输入项目名, ...