Easy machine learning pipelines with pipelearner: intro and call for contributors
@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.
This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).
Examples
Some setup
library(pipelearner)
library(tidyverse)
library(nycflights13)
# Help functions
r_square <- function(model, data) {
actual <- eval(formula(model)[[2]], as.data.frame(data))
residuals <- predict(model, data) - actual
1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE))
}
add_rsquare <- function(result_tbl) {
result_tbl %>%
mutate(rsquare_train = map2_dbl(fit, train, r_square),
rsquare_test = map2_dbl(fit, test, r_square))
}
# Data set
d <- weather %>%
select(visib, humid, precip, wind_dir) %>%
drop_na() %>%
sample_n(2000)
# Set theme for plots
theme_set(theme_minimal())
k-fold cross validation
results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_cvpairs(k = 10) %>%
learn()
results %>%
add_rsquare() %>%
select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(cv_pairs.id, rsquare, color = source)) +
geom_point() +
labs(x = "Fold",
y = "R Squared")

Learning curves
results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_curves(seq(.1, 1, .1)) %>%
learn()
results %>%
add_rsquare() %>%
select(train_p, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(train_p, rsquare, color = source)) +
geom_line() +
geom_point(size = 2) +
labs(x = "Proportion of training data used",
y = "R Squared")

Grid Search
results <- d %>%
pipelearner(rpart::rpart, visib ~ .,
minsplit = c(2, 50, 100),
cp = c(.005, .01, .1)) %>%
learn()
results %>%
mutate(minsplit = map_dbl(params, ~ .$minsplit),
cp = map_dbl(params, ~ .$cp)) %>%
add_rsquare() %>%
select(minsplit, cp, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source),
minsplit = paste("minsplit", minsplit, sep = "\n"),
cp = paste("cp", cp, sep = "\n")) %>%
ggplot(aes(source, rsquare, fill = source)) +
geom_col() +
facet_grid(minsplit ~ cp) +
guides(fill = "none") +
labs(x = NULL, y = "R Squared")

Model comparisons
results <- d %>%
pipelearner() %>%
learn_models(
c(lm, rpart::rpart, randomForest::randomForest),
visib ~ .) %>%
learn()
results %>%
add_rsquare() %>%
select(model, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(model, rsquare, fill = source)) +
geom_col(position = "dodge", size = .5) +
labs(x = NULL, y = "R Squared") +
coord_flip()

Sign off
Thanks for reading and I hope this was useful for you.
For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.
If you’d like the code that produced this blog, check out the blogR GitHub repository.
转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors
Easy machine learning pipelines with pipelearner: intro and call for contributors的更多相关文章
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】
转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...
- 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总
<Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)
##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料(下)
转载:http://www.jianshu.com/p/b73b6953e849 该资源的github地址:Qix <Statistical foundations of machine lea ...
- Intro to Machine Learning
本节主要用于机器学习入门,介绍两个简单的分类模型: 决策树和随机森林 不涉及内部原理,仅仅介绍基础的调用方法 1. How Models Work 以简单的决策树为例 This step of cap ...
- Advice for applying Machine Learning
https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...
- 壁虎书2 End-to-End Machine Learning Project
the main steps: 1. look at the big picture 2. get the data 3. discover and visualize the data to gai ...
随机推荐
- 详细了解 int? 类型
一.int?是什么 二.了解Nullable结构体 三.Nullable类型的取值与转换 1.GetValueOrDefault 2.运算符重载 一.int?是什么 说到int?,或者double?, ...
- mac下安装Java开发环境
1.安装JDK 打开网页,进入jdk官网下:http://www.oracle.com/technetwork/java/javase/downloads/index.html 下载后,进入finde ...
- 【R】正态检验与R语言
正态检验与R语言 1.Kolmogorov–Smirnov test 统计学里, Kolmogorov–Smirnov 检验(亦称:K–S 检验)是用来检验数据是否符合某种分布的一种非参数检验,通过比 ...
- 浅谈C#数组(二)
六.枚举集合 在foreach语句中使用枚举,可以迭代集合中的元素,且无需知道集合中元素的个数.foreach语句使用一个枚举器.foreach会调用实现了IEnumerable接口的集合类中的Get ...
- linux下安装node
经过一番的折腾终于在linux上安装了node,记录下来以免忘记 1.下载node 去官网下载最新的linux版本下对应node.js,node-v6.10.2-linux-x64.tar.gz 2. ...
- IT职场经纬 |阿里web前端面试考题,你能答出来几个?
有很多小伙伴们特别关心面试Web前端开发工程师时,面试官都会问哪些问题.今天小卓把收集来的"阿里Web前端开发面试题"整理贴出来分享给大家伙看看,赶紧收藏起来做准备吧~~ 一.CS ...
- 【uwp】浅谈China Daily中数据同步到One Drive的实现
新版China Daily与旧版相比新增了数据同步的功能,那这个功能具体是如何实现的呢,现在让我们来一起看看. 1.注册应用 开发者中心的应用注册就不用多说了(https://developer.mi ...
- OnsenUI 前端框架(三)
上一章咱们学习了OnsenUI的工具栏.侧边栏和标签栏.通过对页面上这三部分的学习,咱们对混合应用的一个页面有了大体上的认识.从这一章开始,咱们学习OnsenUI混合项目开发过程中会用到的各种各样的组 ...
- Vue2.0的变化 ,组件模板,生命周期,循环,自定义键盘指令,过滤器
组件模板: 之前: <template> <h3>我是组件</h3><strong>我是加粗标签</strong> </templat ...
- [笔记]ACM笔记 - 自用模板
长期更新. 快速幂 lld pow_mod(lld a, lld b, const int &pr) { lld ans = 1; while (b) { if (b & 1) ans ...