Easy machine learning pipelines with pipelearner: intro and call for contributors
@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.
This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).
Examples
Some setup
library(pipelearner)
library(tidyverse)
library(nycflights13)
# Help functions
r_square <- function(model, data) {
actual <- eval(formula(model)[[2]], as.data.frame(data))
residuals <- predict(model, data) - actual
1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE))
}
add_rsquare <- function(result_tbl) {
result_tbl %>%
mutate(rsquare_train = map2_dbl(fit, train, r_square),
rsquare_test = map2_dbl(fit, test, r_square))
}
# Data set
d <- weather %>%
select(visib, humid, precip, wind_dir) %>%
drop_na() %>%
sample_n(2000)
# Set theme for plots
theme_set(theme_minimal())
k-fold cross validation
results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_cvpairs(k = 10) %>%
learn()
results %>%
add_rsquare() %>%
select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(cv_pairs.id, rsquare, color = source)) +
geom_point() +
labs(x = "Fold",
y = "R Squared")

Learning curves
results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_curves(seq(.1, 1, .1)) %>%
learn()
results %>%
add_rsquare() %>%
select(train_p, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(train_p, rsquare, color = source)) +
geom_line() +
geom_point(size = 2) +
labs(x = "Proportion of training data used",
y = "R Squared")

Grid Search
results <- d %>%
pipelearner(rpart::rpart, visib ~ .,
minsplit = c(2, 50, 100),
cp = c(.005, .01, .1)) %>%
learn()
results %>%
mutate(minsplit = map_dbl(params, ~ .$minsplit),
cp = map_dbl(params, ~ .$cp)) %>%
add_rsquare() %>%
select(minsplit, cp, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source),
minsplit = paste("minsplit", minsplit, sep = "\n"),
cp = paste("cp", cp, sep = "\n")) %>%
ggplot(aes(source, rsquare, fill = source)) +
geom_col() +
facet_grid(minsplit ~ cp) +
guides(fill = "none") +
labs(x = NULL, y = "R Squared")

Model comparisons
results <- d %>%
pipelearner() %>%
learn_models(
c(lm, rpart::rpart, randomForest::randomForest),
visib ~ .) %>%
learn()
results %>%
add_rsquare() %>%
select(model, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(model, rsquare, fill = source)) +
geom_col(position = "dodge", size = .5) +
labs(x = NULL, y = "R Squared") +
coord_flip()

Sign off
Thanks for reading and I hope this was useful for you.
For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.
If you’d like the code that produced this blog, check out the blogR GitHub repository.
转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors
Easy machine learning pipelines with pipelearner: intro and call for contributors的更多相关文章
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】
转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...
- 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总
<Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)
##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料(下)
转载:http://www.jianshu.com/p/b73b6953e849 该资源的github地址:Qix <Statistical foundations of machine lea ...
- Intro to Machine Learning
本节主要用于机器学习入门,介绍两个简单的分类模型: 决策树和随机森林 不涉及内部原理,仅仅介绍基础的调用方法 1. How Models Work 以简单的决策树为例 This step of cap ...
- Advice for applying Machine Learning
https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...
- 壁虎书2 End-to-End Machine Learning Project
the main steps: 1. look at the big picture 2. get the data 3. discover and visualize the data to gai ...
随机推荐
- phpcms 笔记
首先是要把首页分为三个部分 : 导航部分 .尾部和首页中间部分 用了三个不同的文件 header.html ; index.html; footer.html 在使用phpcms之前 首先 ...
- 从零开始用 Flask 搭建一个网站(三)
从零开始用 Flask 搭建一个网站(二) 介绍了有关于数据库的运用,接下来我们在完善一下数据在前端以及前端到后端之间的交互.本节涉及到前端,因此也会讲解一下 jinja2 模板.jQuery.aja ...
- android中全局异常捕捉
android中全局异常捕捉 只要写代码就会有bug,但是我们要想办法收集到客户的bug.有第三方bugly或者友盟等可以收集.但是,android原生就提供了有关收集异常的api,所以我们来学习一下 ...
- 日历上添加活动通知(Asp.net)
<div id="calendar_contain"> </div> <script language="javascript" ...
- 关于iOS开发首次进入需要获取地理位置
今天给大家简单介绍一下iOS开发过程中会遇到的获取地理位置的问题,(话不多说进入正题)这里给大家讲解一下两种在APPdelegate获取地理位置的方法: 一:首先是用系统的方法获取地理位置: 1. 首 ...
- Eclipse实现图形化界面插件-vs4e
vs4e插件下载地址:http://visualswing4eclipse.googlecode.com/files/vs4e_0.9.12.I20090527-2200.zip 下载完成后,解压,然 ...
- 业务订单号生成算法,每秒50W左右,不同机器保证不重复,包含日期可读性好
参考snowflace算法,基本思路: 序列12位(更格式化的输出后,性能损耗导致每毫秒生成不了这么多,所以可以考虑减少这里的位,不过留着也并无影响) 机器位10位 毫秒为左移 22位 上述几个做或运 ...
- std::thread使用
本文将从以下三个部分介绍C++11标准中的thread类,本文主要内容为: 启动新线程 等待线程与分离线程 线程唯一标识符 1.启动线程 线程再std::threada对象创建时启动.最简单的情况下, ...
- Ubuntu搭建mysql,Navicat Premium连接
保存编辑结果与退出vim编辑器 https://jingyan.baidu.com/article/495ba8410ff14d38b30ede01.html 首先,我们需要使用apt安装mysql, ...
- 浅析c++/java/c#三大热门编程语言的运行效率
从安全角度考虑,C#是这几中语言中最为安全的,它其中定义的相关安全机制很好的确保了系统的安全... 今天和同学们一起探讨下c++/java/c# 三大热门语言的运行效率情况,以及各自的用途. 估计有很 ...