@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).

Examples

Some setup

library(pipelearner)
library(tidyverse)
library(nycflights13) # Help functions
r_square <- function(model, data) {
actual <- eval(formula(model)[[2]], as.data.frame(data))
residuals <- predict(model, data) - actual
1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE))
}
add_rsquare <- function(result_tbl) {
result_tbl %>%
mutate(rsquare_train = map2_dbl(fit, train, r_square),
rsquare_test = map2_dbl(fit, test, r_square))
} # Data set
d <- weather %>%
select(visib, humid, precip, wind_dir) %>%
drop_na() %>%
sample_n(2000) # Set theme for plots
theme_set(theme_minimal())

k-fold cross validation

results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_cvpairs(k = 10) %>%
learn() results %>%
add_rsquare() %>%
select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(cv_pairs.id, rsquare, color = source)) +
geom_point() +
labs(x = "Fold",
y = "R Squared")

Learning curves

results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_curves(seq(.1, 1, .1)) %>%
learn() results %>%
add_rsquare() %>%
select(train_p, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(train_p, rsquare, color = source)) +
geom_line() +
geom_point(size = 2) +
labs(x = "Proportion of training data used",
y = "R Squared")

Grid Search

results <- d %>%
pipelearner(rpart::rpart, visib ~ .,
minsplit = c(2, 50, 100),
cp = c(.005, .01, .1)) %>%
learn() results %>%
mutate(minsplit = map_dbl(params, ~ .$minsplit),
cp = map_dbl(params, ~ .$cp)) %>%
add_rsquare() %>%
select(minsplit, cp, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source),
minsplit = paste("minsplit", minsplit, sep = "\n"),
cp = paste("cp", cp, sep = "\n")) %>%
ggplot(aes(source, rsquare, fill = source)) +
geom_col() +
facet_grid(minsplit ~ cp) +
guides(fill = "none") +
labs(x = NULL, y = "R Squared")

Model comparisons

results <- d %>%
pipelearner() %>%
learn_models(
c(lm, rpart::rpart, randomForest::randomForest),
visib ~ .) %>%
learn() results %>%
add_rsquare() %>%
select(model, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(model, rsquare, fill = source)) +
geom_col(position = "dodge", size = .5) +
labs(x = NULL, y = "R Squared") +
coord_flip()

Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors

Easy machine learning pipelines with pipelearner: intro and call for contributors的更多相关文章

  1. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  2. 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  3. 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  4. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  5. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  6. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(下)

    转载:http://www.jianshu.com/p/b73b6953e849 该资源的github地址:Qix <Statistical foundations of machine lea ...

  7. Intro to Machine Learning

    本节主要用于机器学习入门,介绍两个简单的分类模型: 决策树和随机森林 不涉及内部原理,仅仅介绍基础的调用方法 1. How Models Work 以简单的决策树为例 This step of cap ...

  8. Advice for applying Machine Learning

    https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...

  9. 壁虎书2 End-to-End Machine Learning Project

    the main steps: 1. look at the big picture 2. get the data 3. discover and visualize the data to gai ...

随机推荐

  1. 1102: 零起点学算法09——继续练习简单的输入和计算(a-b)

    1102: 零起点学算法09--继续练习简单的输入和计算(a-b) Time Limit: 1 Sec  Memory Limit: 520 MB   64bit IO Format: %lldSub ...

  2. [讨论] Window XP 安装msxml6后,load xml时提示schema验证失败

    现象:在windows XP x64下,使用用户安装的msxml6库加载xml文件时失败. 进一步说明: 该xml文档使用了W3C的名称空间 xmlns:xsi= "http://www.w ...

  3. 20144306《网络对抗》CAL_MSF基础运用

    1  实验内容 一个主动攻击,如ms08_067 一个针对浏览器的攻击,如ms11_050 一个针对客户端的攻击,如Adobe 成功应用任何一个辅助模块 2  实验过程记录 2.1 主动攻击MS08- ...

  4. 4 安装MPush

    cnblogs-DOC 1.服务器环境 2.安装Redis3.安装Zookeeper4.安装MPush5.安装Alloc服务6.完整测试7.常见问题 一.Linux安装Mpush [root@loca ...

  5. CentOS下安装JDK的三种方法

    方法一:手动解压JDK的压缩包,然后设置环境变量 1.在/usr/目录下创建java目录 [root@localhost ~]# mkdir/usr/java[root@localhost ~]# c ...

  6. JS模式---发布、订阅模式

    发布订阅模式又叫观察者模式,它定义一种一对多的依赖关系, 当一个对象的状态发生改变时,所有依赖于它的对象都将得到通知. document.body.addEventListener('click', ...

  7. (函数封装)domReady

    一般的我们用window.onload()来判断文档是否加载完成,我们一般采用下面的做法: 当文档加载全部完后,我们在执行代码块(很显然,当需要加载的文档及节点庞大时,用户体验可能会变很差) wind ...

  8. CodeSmith生成实体的分页读取规则

    首先.我得向咱们博客园提个意见,能不能我写的东西就给预保存下呢?刚才我写半天,只因为这个不给力的IE浏览器死了,导致我白写了,如果这要是那个大神直接在这上面写的非常有技术含量的贴着会因此而丢失实在是有 ...

  9. CF798 C. Mike and gcd problem

    /* CF798 C. Mike and gcd problem http://codeforces.com/contest/798/problem/C 数论 贪心 题意:如果一个数列的gcd值大于1 ...

  10. 浅谈JavaScript时间与正则表达式

    时间函数:var box = new Date() 函数       Demo:         alert(Date.parse('4/12/2007'));    //返回的是一个毫秒数11763 ...