Lesser known dplyr tricks
In this blog post I share some lesser-known (at least I believe they are) tricks that use mainly functions from dplyr
.
Removing unneeded columns
Did you know that you can use -
in front of a column name to remove it from a data frame?
mtcars %>%
select(-disp) %>%
head()
## mpg cyl hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 105 2.76 3.460 20.22 1 0 3 1
Re-ordering columns
Still using select()
, it is easy te re-order columns in your data frame:
mtcars %>%
select(cyl, disp, hp, everything()) %>%
head()
## cyl disp hp mpg drat wt qsec vs am gear carb
## Mazda RX4 6 160 110 21.0 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 6 160 110 21.0 3.90 2.875 17.02 0 1 4 4
## Datsun 710 4 108 93 22.8 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 6 258 110 21.4 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 8 360 175 18.7 3.15 3.440 17.02 0 0 3 2
## Valiant 6 225 105 18.1 2.76 3.460 20.22 1 0 3 1
As its name implies everything()
simply means all the other columns.
Renaming columns with rename()
mtcars <- rename(mtcars, spam_mpg = mpg)
mtcars <- rename(mtcars, spam_disp = disp)
mtcars <- rename(mtcars, spam_hp = hp)
head(mtcars)
## spam_mpg cyl spam_disp spam_hp drat wt qsec vs am
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
## gear carb
## Mazda RX4 4 4
## Mazda RX4 Wag 4 4
## Datsun 710 4 1
## Hornet 4 Drive 3 1
## Hornet Sportabout 3 2
## Valiant 3 1
Selecting columns with a regexp
It is easy to select the columns that start with “spam” with some helper functions:
mtcars %>%
select(contains("spam")) %>%
head()
## spam_mpg spam_disp spam_hp
## Mazda RX4 21.0 160 110
## Mazda RX4 Wag 21.0 160 110
## Datsun 710 22.8 108 93
## Hornet 4 Drive 21.4 258 110
## Hornet Sportabout 18.7 360 175
## Valiant 18.1 225 105
take also a look at starts_with()
, ends_with()
, contains()
, matches()
, num_range()
, one_of()
and everything()
.
Create new columns with mutate()
and if_else()
mtcars %>%
mutate(vs_new = if_else(
vs == 1,
"one",
"zero",
NA_character_)) %>%
head()
## spam_mpg cyl spam_disp spam_hp drat wt qsec vs am gear carb vs_new
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 zero
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 zero
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 one
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 one
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 zero
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 one
You might want to create a new variable conditionally on several values of another column:
mtcars %>%
mutate(carb_new = case_when(.$carb == 1 ~ "one",
.$carb == 2 ~ "two",
.$carb == 4 ~ "four",
TRUE ~ "other")) %>%
head(15)
## spam_mpg cyl spam_disp spam_hp drat wt qsec vs am gear carb
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## carb_new
## 1 four
## 2 four
## 3 one
## 4 one
## 5 two
## 6 one
## 7 four
## 8 two
## 9 two
## 10 four
## 11 four
## 12 other
## 13 other
## 14 other
## 15 four
Mind the .$
before the variable carb
. There is a github issue about this, and it is already fixed in the development version of dplyr
, which means that in the next version of dplyr
, case_when()
will work as any other specialized dplyr
function inside mutate()
.
Apply a function to certain columns only, by rows
mtcars %>%
select(am, gear, carb) %>%
purrr::by_row(sum, .collate = "cols", .to = "sum_am_gear_carb") -> mtcars2
head(mtcars2)
## # A tibble: 6 × 4
## am gear carb sum_am_gear_carb
## <dbl> <dbl> <dbl> <dbl>
## 1 1 4 4 9
## 2 1 4 4 9
## 3 1 4 1 6
## 4 0 3 1 4
## 5 0 3 2 5
## 6 0 3 1 4
For this, I had to use purrr
’s by_row()
function. You can then add this column to your original data frame:
mtcars <- cbind(mtcars, "sum_am_gear_carb" = mtcars2$sum_am_gear_carb)
head(mtcars)
## spam_mpg cyl spam_disp spam_hp drat wt qsec vs am
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0
## gear carb sum_am_gear_carb
## Mazda RX4 4 4 9
## Mazda RX4 Wag 4 4 9
## Datsun 710 4 1 6
## Hornet 4 Drive 3 1 4
## Hornet Sportabout 3 2 5
## Valiant 3 1 4
Use do()
to do any arbitrary operation
mtcars %>%
group_by(cyl) %>%
do(models = lm(spam_mpg ~ drat + wt, data = .)) %>%
broom::tidy(models)
## Source: local data frame [9 x 6]
## Groups: cyl [3]
##
## cyl term estimate std.error statistic p.value
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 4 (Intercept) 33.2493403 17.0987286 1.9445504 0.087727622
## 2 4 drat 1.3244329 3.4519717 0.3836743 0.711215433
## 3 4 wt -5.2400608 2.2150213 -2.3656932 0.045551615
## 4 6 (Intercept) 30.6544931 7.5141648 4.0795609 0.015103868
## 5 6 drat -0.4435744 1.1740862 -0.3778039 0.724768945
## 6 6 wt -2.9902720 1.5685053 -1.9064468 0.129274249
## 7 8 (Intercept) 29.6519180 7.0878976 4.1834574 0.001527613
## 8 8 drat -1.4698722 1.6285054 -0.9025897 0.386081744
## 9 8 wt -2.4518017 0.7985112 -3.0704664 0.010651044
do()
is useful when you want to use any R function (user defined functions work too!) with dplyr
functions. First I grouped the observations by cyl
and then ran a linear model for each group. Then I converted the output to a tidy data frame usingbroom::tidy()
.
Using dplyr()
functions inside your own functions
extract_vars <- function(data, some_string){
data %>%
select_(lazyeval::interp(~contains(some_string))) -> data
return(data)
}
extract_vars(mtcars, "spam")
## spam_mpg spam_disp spam_hp
## Mazda RX4 21.0 160.0 110
## Mazda RX4 Wag 21.0 160.0 110
## Datsun 710 22.8 108.0 93
## Hornet 4 Drive 21.4 258.0 110
## Hornet Sportabout 18.7 360.0 175
## Valiant 18.1 225.0 105
## Duster 360 14.3 360.0 245
## Merc 240D 24.4 146.7 62
## Merc 230 22.8 140.8 95
## Merc 280 19.2 167.6 123
## Merc 280C 17.8 167.6 123
## Merc 450SE 16.4 275.8 180
## Merc 450SL 17.3 275.8 180
## Merc 450SLC 15.2 275.8 180
## Cadillac Fleetwood 10.4 472.0 205
## Lincoln Continental 10.4 460.0 215
## Chrysler Imperial 14.7 440.0 230
## Fiat 128 32.4 78.7 66
## Honda Civic 30.4 75.7 52
## Toyota Corolla 33.9 71.1 65
## Toyota Corona 21.5 120.1 97
## Dodge Challenger 15.5 318.0 150
## AMC Javelin 15.2 304.0 150
## Camaro Z28 13.3 350.0 245
## Pontiac Firebird 19.2 400.0 175
## Fiat X1-9 27.3 79.0 66
## Porsche 914-2 26.0 120.3 91
## Lotus Europa 30.4 95.1 113
## Ford Pantera L 15.8 351.0 264
## Ferrari Dino 19.7 145.0 175
## Maserati Bora 15.0 301.0 335
## Volvo 142E 21.4 121.0 109
About this last point, you can read more about it here.
Hope you liked this small list of tricks!
转自:http://www.brodrigues.co/blog/2017-02-17-lesser_known_tricks/
Lesser known dplyr tricks的更多相关文章
- Lesser known purrr tricks
purrr is package that extends R's functional programming capabilities. It brings a lot of new stuff ...
- R语言数据处理包dplyr、tidyr笔记
dplyr包是Hadley Wickham的新作,主要用于数据清洗和整理,该包专注dataframe数据格式,从而大幅提高了数据处理速度,并且提供了与其它数据库的接口:tidyr包的作者是Hadley ...
- testng 教程之使用参数的一些tricks配合使用reportng
前两次的总结:testng annotation生命周期 http://www.cnblogs.com/tobecrazy/p/4579414.html testng.xml的使用和基本配置http: ...
- (转) How to Train a GAN? Tips and tricks to make GANs work
How to Train a GAN? Tips and tricks to make GANs work 转自:https://github.com/soumith/ganhacks While r ...
- R语言数据处理利器——dplyr简介
dplyr是由Hadley Wickham主持开发和维护的一个主要针对数据框快速计算.整合的函数包,同时提供一些常用函数的高速写法以及几个开源数据库的连接.此包是plyr包的深化功能包,其名字中的字母 ...
- Matlab tips and tricks
matlab tips and tricks and ... page overview: I created this page as a vectorization helper but it g ...
- dplyr包--数据操作与清洗
1.简介 在我们数据分析的实际应用中,我们可能会花费大量的时间在数据清洗上,而如果使用 R 里面自带的一些函数(base 包的 transform 等),可能会觉得力不从心,或者不是很人性化.好在我们 ...
- LoadRunner AJAX TruClient协议Tips and Tricks
LoadRunner AJAX TruClient协议Tips and Trickshttp://automationqa.com/forum.php?mod=viewthread&tid=2 ...
- 【翻译】C# Tips & Tricks: Weak References - When and How to Use Them
原文:C# Tips & Tricks: Weak References - When and How to Use Them Sometimes you have an object whi ...
随机推荐
- Linux服务器性能查看分析调优
一 linux服务器性能查看 1.1 cpu性能查看 1.查看物理cpu个数: cat /proc/cpuinfo |grep "physical id"|sort|uniq|wc ...
- Python爬虫 Urllib库的高级用法
1.设置Headers 有些网站不会同意程序直接用上面的方式进行访问,如果识别有问题,那么站点根本不会响应,所以为了完全模拟浏览器的工作,我们需要设置一些Headers 的属性. 首先,打开我们的浏览 ...
- 深度学习开发环境搭建教程(Mac篇)
本文将指导你如何在自己的Mac上部署Theano + Keras的深度学习开发环境. 如果你的Mac不自带NVIDIA的独立显卡(例如15寸以下或者17年新款的Macbook.具体可以在"关 ...
- 学习css之文本属性
css3之文本属性: 1.缩进和水平对齐:text-indent, 通过使用 text-indent 属性,所有元素的第一行都可以缩进一个给定的长度,甚至该长度可以是负值. 这个属性最常见的用途是将段 ...
- web安全色
web安全色产生的原因 不同的平台(Mac.PC等)有不同的调色板,不同的浏览器也有自己的调色板.这就意味着对于一幅图,显示在Mac上的Web浏览器中的图像,与它在PC上相同浏览器中显示的效果可能差别 ...
- JDK安装、java环境配置
JDK是Java语言的软件开发工具包,主要用于移动设备.嵌入式设备上的java应用程序.JDK是整个java开发的核心,它包含了JAVA的运行环境,JAVA工具和JAVA基础的类库. JRE(Java ...
- OC的内存管理(二)ARC
指针: 指向内存的地址指针变量 存放地址的变量指针变量值 变量中存放的值(地址值)指针变量指向的内存单元值 内存地址指向的值1):强指针:默认的情况下,所有的指针都是强指针,关键字strong ):弱 ...
- 阿里react整合库dva demo分析
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 24.0px "Helvetica Neue"; color: #404040 } p. ...
- hdu698 Just a Hook 线段树-成段更新
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1698 很简单的一个线段树的题目,每次更新采用lazy思想,这里我采用了增加一个变量z,z不等于0时其绝 ...
- redis的安装部署启动停止<17.3.21已更新>
--------------------------------------------------------- 启动redis时使用下面两条命令: redis-server /etc/redis. ...