@drsimonj here to introduce pipelearner – a package I’m developing to make it easy to create machine learning pipelines in R – and to spread the word in the hope that some readers may be interested in contributing or testing it.

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Head to the pipelearner Github page to learn more and contact me if you have a chance to test it yourself or are interested in contributing (my contact details are at the end of this post).

Examples

Some setup

library(pipelearner)
library(tidyverse)
library(nycflights13) # Help functions
r_square <- function(model, data) {
actual <- eval(formula(model)[[2]], as.data.frame(data))
residuals <- predict(model, data) - actual
1 - (var(residuals, na.rm = TRUE) / var(actual, na.rm = TRUE))
}
add_rsquare <- function(result_tbl) {
result_tbl %>%
mutate(rsquare_train = map2_dbl(fit, train, r_square),
rsquare_test = map2_dbl(fit, test, r_square))
} # Data set
d <- weather %>%
select(visib, humid, precip, wind_dir) %>%
drop_na() %>%
sample_n(2000) # Set theme for plots
theme_set(theme_minimal())

k-fold cross validation

results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_cvpairs(k = 10) %>%
learn() results %>%
add_rsquare() %>%
select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(cv_pairs.id, rsquare, color = source)) +
geom_point() +
labs(x = "Fold",
y = "R Squared")

Learning curves

results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_curves(seq(.1, 1, .1)) %>%
learn() results %>%
add_rsquare() %>%
select(train_p, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(train_p, rsquare, color = source)) +
geom_line() +
geom_point(size = 2) +
labs(x = "Proportion of training data used",
y = "R Squared")

Grid Search

results <- d %>%
pipelearner(rpart::rpart, visib ~ .,
minsplit = c(2, 50, 100),
cp = c(.005, .01, .1)) %>%
learn() results %>%
mutate(minsplit = map_dbl(params, ~ .$minsplit),
cp = map_dbl(params, ~ .$cp)) %>%
add_rsquare() %>%
select(minsplit, cp, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source),
minsplit = paste("minsplit", minsplit, sep = "\n"),
cp = paste("cp", cp, sep = "\n")) %>%
ggplot(aes(source, rsquare, fill = source)) +
geom_col() +
facet_grid(minsplit ~ cp) +
guides(fill = "none") +
labs(x = NULL, y = "R Squared")

Model comparisons

results <- d %>%
pipelearner() %>%
learn_models(
c(lm, rpart::rpart, randomForest::randomForest),
visib ~ .) %>%
learn() results %>%
add_rsquare() %>%
select(model, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(model, rsquare, fill = source)) +
geom_col(position = "dodge", size = .5) +
labs(x = NULL, y = "R Squared") +
coord_flip()

Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me atdrsimonjackson@gmail.com to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

转自:https://drsimonj.svbtle.com/easy-machine-learning-pipelines-with-pipelearner-intro-and-call-for-contributors

Easy machine learning pipelines with pipelearner: intro and call for contributors的更多相关文章

  1. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  2. 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  3. 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  4. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  5. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  6. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(下)

    转载:http://www.jianshu.com/p/b73b6953e849 该资源的github地址:Qix <Statistical foundations of machine lea ...

  7. Intro to Machine Learning

    本节主要用于机器学习入门,介绍两个简单的分类模型: 决策树和随机森林 不涉及内部原理,仅仅介绍基础的调用方法 1. How Models Work 以简单的决策树为例 This step of cap ...

  8. Advice for applying Machine Learning

    https://jmetzen.github.io/2015-01-29/ml_advice.html Advice for applying Machine Learning This post i ...

  9. 壁虎书2 End-to-End Machine Learning Project

    the main steps: 1. look at the big picture 2. get the data 3. discover and visualize the data to gai ...

随机推荐

  1. 应用程序写Xml文档

    主要用到CreateElement.CreateTextNode.CreateComment.AppendChild.InsertAfter方法 代码如下: XmlDocument document ...

  2. php最新微信扫码在线支付接口。ecshop和shopex,shopnc下完美无错

    最近为客户的一个在线商城做了一个微信扫码在线支付的接口.跟大家分享一下. 1 首先可以模仿其他的接口,比如支付宝,财付通等的接口,构建模块功能文件和语言文件.2 微信提供2种扫码方式,大家可以根据自己 ...

  3. jquery-ul-li实现分页功能 转载仅供交流

    js文件代码: (function($){ $.fn.Pages = function(options){ var opts = $.extend({},$.fn.Pages.defaults, op ...

  4. 关于DCL的使用

    DCL1 创建用户语法:CREATE USER 用户名@地址 IDENTIFIED BY '密码';CREATE USER user1@localhost IDENTIFIED BY '123'; C ...

  5. React之key详解

    一个例子 有这样的一个场景如下图所示,有一组动态数量的input,可以增加和删除和重新排序,数组元素生成的组件用index作为key的值,例如下图生成的ui展示: 上面例子中的input组件渲染的代码 ...

  6. Atom手动安装插件和模块的解决方案

    最近开始使用Atom编辑器写作.为了预览带LaTeX公式的markdown文档,尝试安装插件markdown-preview-plus,但是总是失败.经过仔细查看错误输出和网上相关问答,发现尽管报错为 ...

  7. .NET遇上Docker - Docker集成Cron定时运行.NETCore(ConsoleApp)程序.md

    配置项目的Docker支持 对于VS中Docker的配置,依旧重复一些废话. 给项目添加Docker支持,VS2015可以直接使用Docker for VS插件,VS2017在安装时选择容器支持.VS ...

  8. 各种 SVG 制作单选和多选框动画

    在线演示      源码下载

  9. linq语句复杂查询和分开查询的性能对比

    刚开始以为复杂的linq语句查询会不会比分开来写效率高,因为复杂的语句关联和嵌套多,执行应该慢.分开写虽然多了一次io处理,但是关联比较少,数据了比价少,和朋友讨了一下,回家就做了个测试,废话不多说, ...

  10. APP品牌具体有哪几个要素?又是如何操作的?

    对于品牌的一些认识 首先我们要知道,品牌是由用户与公司及其产品&服务互动后所产生的,失去了与用户互动并且承认的是伪品牌,对于开发者来说,APP的品牌要先从标志与颜色考虑起,但实话实说,标志与颜 ...