@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).

Examples

Some setup

library(pipelearner)
library(tidyverse)
library(nycflights13) # Help functions
r_square <- function(model, data) {
actual <- eval(formula(model)[[2]], as.data.frame(data))
residuals <- predict(model, data) - actual
1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE))
}
add_rsquare <- function(result_tbl) {
result_tbl %>%
mutate(rsquare_train = map2_dbl(fit, train, r_square),
rsquare_test = map2_dbl(fit, test, r_square))
} # Data set
d <- weather %>%
select(visib, humid, precip, wind_dir) %>%
drop_na() %>%
sample_n(2000) # Set theme for plots
theme_set(theme_minimal())

k-fold cross validation

results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_cvpairs(k = 10) %>%
learn() results %>%
add_rsquare() %>%
select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(cv_pairs.id, rsquare, color = source)) +
geom_point() +
labs(x = "Fold",
y = "R Squared")

Learning curves

results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_curves(seq(.1, 1, .1)) %>%
learn() results %>%
add_rsquare() %>%
select(train_p, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(train_p, rsquare, color = source)) +
geom_line() +
geom_point(size = 2) +
labs(x = "Proportion of training data used",
y = "R Squared")

Grid Search

results <- d %>%
pipelearner(rpart::rpart, visib ~ .,
minsplit = c(2, 50, 100),
cp = c(.005, .01, .1)) %>%
learn() results %>%
mutate(minsplit = map_dbl(params, ~ .$minsplit),
cp = map_dbl(params, ~ .$cp)) %>%
add_rsquare() %>%
select(minsplit, cp, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source),
minsplit = paste("minsplit", minsplit, sep = "\n"),
cp = paste("cp", cp, sep = "\n")) %>%
ggplot(aes(source, rsquare, fill = source)) +
geom_col() +
facet_grid(minsplit ~ cp) +
guides(fill = "none") +
labs(x = NULL, y = "R Squared")

Model comparisons

results <- d %>%
pipelearner() %>%
learn_models(
c(lm, rpart::rpart, randomForest::randomForest),
visib ~ .) %>%
learn() results %>%
add_rsquare() %>%
select(model, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(model, rsquare, fill = source)) +
geom_col(position = "dodge", size = .5) +
labs(x = NULL, y = "R Squared") +
coord_flip()

Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors

Easy machine learning pipelines with pipelearner: intro and call for contributors的更多相关文章

  1. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  2. 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  3. 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  4. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  5. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  6. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(下)

    转载:http://www.jianshu.com/p/b73b6953e849 该资源的github地址:Qix <Statistical foundations of machine lea ...

  7. Intro to Machine Learning

    本节主要用于机器学习入门,介绍两个简单的分类模型: 决策树和随机森林 不涉及内部原理,仅仅介绍基础的调用方法 1. How Models Work 以简单的决策树为例 This step of cap ...

  8. Advice for applying Machine Learning

    https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...

  9. 壁虎书2 End-to-End Machine Learning Project

    the main steps: 1. look at the big picture 2. get the data 3. discover and visualize the data to gai ...

随机推荐

  1. C++中的类继承(1) 三种继承方式

    继承是使代码可以复用的重要手段,也是面向对象程序设计的核心思想之一.简单的说,继承是指一个对象直接使用另一对象的属性和方法.继承呈现了 面向对象程序设 计的层次结构, 体现了 由简单到复杂的认知过程. ...

  2. !function 笔记

    一般看JQuery插件里的写法是这样的 (function($) { //... })(jQuery); 今天看到bootstrap的javascript组件是这样写的 !function( $ ){ ...

  3. ARM中断处理过程

    以s3c2440  ARM9核为例: 一:s3c2440 ARM处理器特性: 1.S3C2440支持个中断源,含子中断源: 2.ARM9采用五级流水线方式: 3.支持外部中断和内部中断: 二.s3c2 ...

  4. let 和 const 关键字

    看了阮老师的ES6入门再加上自己的一些理解整理出的学习笔记 let关键字 跟var相比,不会提升为全局变量,始终是块级作用域{} 注意点: 1: 不能在同一个块级作用域内声明同名变量 2: (如果当前 ...

  5. Xamarin GitHub 下载的源码运行不了

     初学Xamarin ,各种折腾,大概这公司破电脑配置差,老是很多问题. GitHub 真是个好东西,可以参考别人做的,不过下载来运行不了就各种折腾了,为此我重装电脑两次了,反正win10安装就十几分 ...

  6. 读书笔记 effective c++ Item 27 尽量少使用转型(casting)

    C++设计的规则是用来保证使类型相关的错误不再可能出现.理论上来说,如果你的程序能够很干净的通过编译,它就不会尝试在任何对象上执行任何不安全或无意义的操作.这个保证很有价值,不要轻易放弃它. 不幸的是 ...

  7. phpcms后台管理

    phpcms从网上下载就好了,记住这个要安装在Wamp中的www文件下 从网页输入网址进入后台控制 输入密码账号,即进入后台控制界面: 后台管理有自带的网页模板把他换成自己的模板: 修改站点:   把 ...

  8. 浅谈JavaScript时间与正则表达式

    时间函数:var box = new Date() 函数       Demo:         alert(Date.parse('4/12/2007'));    //返回的是一个毫秒数11763 ...

  9. ssh公钥认证原理及设置root外的其他用户登录ssh

    1)创建其他用户 useradd [-d 登录目录] [-G ssh][用户名]  一定要将用户添加到ssh组不然无法没有权限登录ssh 2)设置ssh不允许root登录 vi /etc/ssh/ss ...

  10. jquery通过ajax向后台发送(checkbox)数组,并在后台接收,(发送的数据是checkedbox)

    版权声明:本文为博主原创文章,未经博主允许不得转载. $(document).ready(function(){ var flag = 1; $("#delBtn").click( ...