In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.

The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.

When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.

The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).

 from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)

[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章

  1. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  2. [Machine Learning with Python] Data Visualization by Matplotlib Library

    Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...

  3. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  4. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  5. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  6. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. 10,Scrapy简单入门及实例讲解

    Scrapy是一个为了爬取网站数据,提取结构性数据而编写的应用框架. 其可以应用在数据挖掘,信息处理或存储历史数据等一系列的程序中.其最初是为了页面抓取 (更确切来说, 网络抓取 )所设计的, 也可以 ...

  2. mysql 外连接的时候,条件在on后面和条件在where后面的区别

    最近使用mysql的时候碰到一个问题:当一个表外联另一个表的时候,将一些查询条件放在on后面和放在where后面不太一样: 学生分数表stuscore: 当查询语句如下(查询语句1): SELECT ...

  3. C#入门篇6-1:字符串操作 char常用的函数

    //char 字符的常见操作 public static void FChar() { //判定字符的类别 char ch1 = 'a';//使用小引号 bool bl = true; bl = ch ...

  4. 我给女朋友讲编程CSS系列(3) CSS如何设置字体的类型、大小、颜色,如何使用火狐浏览器的Firebug插件查看网页的字体

    一.CSS如何设置字体的类型.大小.颜色 设计网页时,一般设置body的字体,让其他标签继承body的字体,这样设置特别方便,但是标题标签h1到h6和表单标签(input类型)是没有继承body的字体 ...

  5. MFC深入浅出读书笔记第二部分2

    第七章  MFC骨干程序 所谓骨干程序就是指有AppWizard生成的MFC程序.如下图的层次关系是程序中常用的几个类,一定要熟记于心. 1 Document/View应用程序 CDocument存放 ...

  6. c++ primer plus 第6版 部分三 9章 - 章

    c++ primer plus 第6版                                               部分三 9章 - 章 第9章   内存模型和名称空间 1.单独编译 ...

  7. js日期处理

    Js获取当前日期时间及其它操作 var myDate = new Date(); myDate.getYear(); //获取当前年份(2位) myDate.getFullYear(); //获取完整 ...

  8. Block Nested-Loop 和 Batched Key Access

    官方文档:https://dev.mysql.com/doc/refman/5.7/en/bnl-bka-optimization.html BNL和BKA是MySQL 表关联的两种关联算法 比如t1 ...

  9. 项目记事【StreamAPI】:使用 StreamAPI 简化对 Collection 的操作

    最近项目里有这么一段代码,我在做 code-review 的时候,觉得可以使用 Java8 StreamAPI 简化一下. 这里先看一下代码(不是源码,一些敏感信息被我用其他类替代了): privat ...

  10. 将数据缓存到sessionStorage中

    //获取侧边栏 if (sessionStorage.getItem(`${env}${empId}leftMenu`)) { const leftMenu = JSON.parse(sessionS ...