Lesser known dplyr tricks
In this blog post I share some lesser-known (at least I believe they are) tricks that use mainly functions from dplyr.
Removing unneeded columns
Did you know that you can use - in front of a column name to remove it from a data frame?
mtcars %>%
select(-disp) %>%
head()
## mpg cyl hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 105 2.76 3.460 20.22 1 0 3 1
Re-ordering columns
Still using select(), it is easy te re-order columns in your data frame:
mtcars %>%
select(cyl, disp, hp, everything()) %>%
head()
## cyl disp hp mpg drat wt qsec vs am gear carb
## Mazda RX4 6 160 110 21.0 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 6 160 110 21.0 3.90 2.875 17.02 0 1 4 4
## Datsun 710 4 108 93 22.8 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 6 258 110 21.4 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 8 360 175 18.7 3.15 3.440 17.02 0 0 3 2
## Valiant 6 225 105 18.1 2.76 3.460 20.22 1 0 3 1
As its name implies everything() simply means all the other columns.
Renaming columns with rename()
mtcars <- rename(mtcars, spam_mpg = mpg)
mtcars <- rename(mtcars, spam_disp = disp)
mtcars <- rename(mtcars, spam_hp = hp)
head(mtcars)
## spam_mpg cyl spam_disp spam_hp drat wt qsec vs am
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
## gear carb
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Datsun 710 4 1
## Hornet 4 Drive 3 1
## Hornet Sportabout 3 2
## Valiant 3 1
Selecting columns with a regexp
It is easy to select the columns that start with “spam” with some helper functions:
mtcars %>%
select(contains("spam")) %>%
head()
## spam_mpg spam_disp spam_hp
## Mazda RX4 21.0 160 110
## Mazda RX4 Wag 21.0 160 110
## Datsun 710 22.8 108 93
## Hornet 4 Drive 21.4 258 110
## Hornet Sportabout 18.7 360 175
## Valiant 18.1 225 105
take also a look at starts_with(), ends_with(), contains(), matches(), num_range(), one_of() and everything().
Create new columns with mutate() and if_else()
mtcars %>%
mutate(vs_new = if_else(
vs == 1,
"one",
"zero",
NA_character_)) %>%
head()
## spam_mpg cyl spam_disp spam_hp drat wt qsec vs am gear carb vs_new
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 zero
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 zero
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 one
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 one
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 zero
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 one
You might want to create a new variable conditionally on several values of another column:
mtcars %>%
mutate(carb_new = case_when(.$carb == 1 ~ "one",
.$carb == 2 ~ "two",
.$carb == 4 ~ "four",
TRUE ~ "other")) %>%
head(15)
## spam_mpg cyl spam_disp spam_hp drat wt qsec vs am gear carb
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## carb_new
## 1 four
## 2 four
## 3 one
## 4 one
## 5 two
## 6 one
## 7 four
## 8 two
## 9 two
## 10 four
## 11 four
## 12 other
## 13 other
## 14 other
## 15 four
Mind the .$ before the variable carb. There is a github issue about this, and it is already fixed in the development version of dplyr, which means that in the next version of dplyr, case_when() will work as any other specialized dplyr function inside mutate().
Apply a function to certain columns only, by rows
mtcars %>%
select(am, gear, carb) %>%
purrr::by_row(sum, .collate = "cols", .to = "sum_am_gear_carb") -> mtcars2
head(mtcars2)
## # A tibble: 6 × 4
## am gear carb sum_am_gear_carb
## <dbl> <dbl> <dbl> <dbl>
## 1 1 4 4 9
## 2 1 4 4 9
## 3 1 4 1 6
## 4 0 3 1 4
## 5 0 3 2 5
## 6 0 3 1 4
For this, I had to use purrr’s by_row() function. You can then add this column to your original data frame:
mtcars <- cbind(mtcars, "sum_am_gear_carb" = mtcars2$sum_am_gear_carb)
head(mtcars)
## spam_mpg cyl spam_disp spam_hp drat wt qsec vs am
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
## gear carb sum_am_gear_carb
## Mazda RX4 4 4 9
## Mazda RX4 Wag 4 4 9
## Datsun 710 4 1 6
## Hornet 4 Drive 3 1 4
## Hornet Sportabout 3 2 5
## Valiant 3 1 4
Use do() to do any arbitrary operation
mtcars %>%
group_by(cyl) %>%
do(models = lm(spam_mpg ~ drat + wt, data = .)) %>%
broom::tidy(models)
## Source: local data frame [9 x 6]
## Groups: cyl [3]
##
## cyl term estimate std.error statistic p.value
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 4 (Intercept) 33.2493403 17.0987286 1.9445504 0.087727622
## 2 4 drat 1.3244329 3.4519717 0.3836743 0.711215433
## 3 4 wt -5.2400608 2.2150213 -2.3656932 0.045551615
## 4 6 (Intercept) 30.6544931 7.5141648 4.0795609 0.015103868
## 5 6 drat -0.4435744 1.1740862 -0.3778039 0.724768945
## 6 6 wt -2.9902720 1.5685053 -1.9064468 0.129274249
## 7 8 (Intercept) 29.6519180 7.0878976 4.1834574 0.001527613
## 8 8 drat -1.4698722 1.6285054 -0.9025897 0.386081744
## 9 8 wt -2.4518017 0.7985112 -3.0704664 0.010651044
do() is useful when you want to use any R function (user defined functions work too!) with dplyr functions. First I grouped the observations by cyl and then ran a linear model for each group. Then I converted the output to a tidy data frame usingbroom::tidy().
Using dplyr() functions inside your own functions
extract_vars <- function(data, some_string){
data %>%
select_(lazyeval::interp(~contains(some_string))) -> data
return(data)
}
extract_vars(mtcars, "spam")
## spam_mpg spam_disp spam_hp
## Mazda RX4 21.0 160.0 110
## Mazda RX4 Wag 21.0 160.0 110
## Datsun 710 22.8 108.0 93
## Hornet 4 Drive 21.4 258.0 110
## Hornet Sportabout 18.7 360.0 175
## Valiant 18.1 225.0 105
## Duster 360 14.3 360.0 245
## Merc 240D 24.4 146.7 62
## Merc 230 22.8 140.8 95
## Merc 280 19.2 167.6 123
## Merc 280C 17.8 167.6 123
## Merc 450SE 16.4 275.8 180
## Merc 450SL 17.3 275.8 180
## Merc 450SLC 15.2 275.8 180
## Cadillac Fleetwood 10.4 472.0 205
## Lincoln Continental 10.4 460.0 215
## Chrysler Imperial 14.7 440.0 230
## Fiat 128 32.4 78.7 66
## Honda Civic 30.4 75.7 52
## Toyota Corolla 33.9 71.1 65
## Toyota Corona 21.5 120.1 97
## Dodge Challenger 15.5 318.0 150
## AMC Javelin 15.2 304.0 150
## Camaro Z28 13.3 350.0 245
## Pontiac Firebird 19.2 400.0 175
## Fiat X1-9 27.3 79.0 66
## Porsche 914-2 26.0 120.3 91
## Lotus Europa 30.4 95.1 113
## Ford Pantera L 15.8 351.0 264
## Ferrari Dino 19.7 145.0 175
## Maserati Bora 15.0 301.0 335
## Volvo 142E 21.4 121.0 109
About this last point, you can read more about it here.
Hope you liked this small list of tricks!
转自:http://www.brodrigues.co/blog/2017-02-17-lesser_known_tricks/
Lesser known dplyr tricks的更多相关文章
- Lesser known purrr tricks
purrr is package that extends R's functional programming capabilities. It brings a lot of new stuff ...
- R语言数据处理包dplyr、tidyr笔记
dplyr包是Hadley Wickham的新作,主要用于数据清洗和整理,该包专注dataframe数据格式,从而大幅提高了数据处理速度,并且提供了与其它数据库的接口:tidyr包的作者是Hadley ...
- testng 教程之使用参数的一些tricks配合使用reportng
前两次的总结:testng annotation生命周期 http://www.cnblogs.com/tobecrazy/p/4579414.html testng.xml的使用和基本配置http: ...
- (转) How to Train a GAN? Tips and tricks to make GANs work
How to Train a GAN? Tips and tricks to make GANs work 转自:https://github.com/soumith/ganhacks While r ...
- R语言数据处理利器——dplyr简介
dplyr是由Hadley Wickham主持开发和维护的一个主要针对数据框快速计算.整合的函数包,同时提供一些常用函数的高速写法以及几个开源数据库的连接.此包是plyr包的深化功能包,其名字中的字母 ...
- Matlab tips and tricks
matlab tips and tricks and ... page overview: I created this page as a vectorization helper but it g ...
- dplyr包--数据操作与清洗
1.简介 在我们数据分析的实际应用中,我们可能会花费大量的时间在数据清洗上,而如果使用 R 里面自带的一些函数(base 包的 transform 等),可能会觉得力不从心,或者不是很人性化.好在我们 ...
- LoadRunner AJAX TruClient协议Tips and Tricks
LoadRunner AJAX TruClient协议Tips and Trickshttp://automationqa.com/forum.php?mod=viewthread&tid=2 ...
- 【翻译】C# Tips & Tricks: Weak References - When and How to Use Them
原文:C# Tips & Tricks: Weak References - When and How to Use Them Sometimes you have an object whi ...
随机推荐
- 2017携程Web前端实习生招聘笔试题总结
考察encodeURI encodeURI(), decodeURI()它们都是Global对象的方法. encodeURI()通过将某些字符的每个实例替换代表字符的UTF-8编码的一个或多个转义字符 ...
- MySQL数据库的安装布局
首先我们要安装(mysql-5.0.18-win32_zip) 第一步:点击(Setup.exe) 第二步:开始安装(MySQL Server5.0版本) 1.点击(Next) 2.选Custom自定 ...
- netcore实践:跨平台动态加载native组件
缘起netcore框架下实现基于zmq的应用. 在.net framework时代,我们进行zmq开发由很多的选择,比较常用的有clrzmq4和NetMQ. 其中clrzmq是基于libzmq的Int ...
- C#小知识点记录,对象的深拷贝
在CSDN中的定义是: public static string CompareExchange( ref string location1, string value, string compara ...
- Arraylist动态扩容详解
ArrayList 概述 ArrayList是基于数组实现的,是一个动态数组,其容量能自动增长. ArrayList不是线程安全的,只能用在单线程环境下. 实现了Serializable接口,因此它支 ...
- 【lucene系列学习四】log4j日志文件实现多线程的测试
参考资料:http://nudtgk2000.iteye.com/blog/1716379 首先,在http://www.apache.org/dyn/closer.cgi/logging/log4j ...
- input 显示/隐藏密码
js代码: // 显示/隐藏密码 $('.open').on('click',function(){ if($("#psw").prop('type')=='password'){ ...
- 『算法』Dinic求最大流
作为一个[NOIP+,省选-]算法,这个算法真的很暴力.同样是最大流,跑得比EK不知快到哪里去了.首先是一个 广度优先搜索() { 按照可用路径上节点的访问顺序标号. 然后判断一下能否到汇点. 如果不 ...
- WinForm界面布局
一直很羡慕和佩服园子中伍华聪的界面设计和布局.好多年都没有真正写过C/S项目了,今天翻出来6年前刚开始学习WinForm的时候写的一个简单的HR管理系统,思绪一下子很复杂,记得是6年前的夏天,天气很热 ...
- OC点语法介绍和使用以及@property关键字
使用"点语法" Person *p =[Person new]; //点语法 //对象.属性名 //注意,此时 (p.age)并不是直接方法实例对象 //而是xcode可能到点语法 ...