In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.

The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.

When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.

The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).

 from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)

[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章

  1. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  2. [Machine Learning with Python] Data Visualization by Matplotlib Library

    Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...

  3. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  4. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  5. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  6. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. Keepalivaed +Nginx proxy 高可用架构方案与实施过程细节

    1.开源产品介绍 1)CMS介绍 官方网站http://www.dedecms.com/,是一个网站应用系统构建平台,也是一个强大的网站内容管理系统,既可以用来构建复杂的体系的企业信息门户或者电子商务 ...

  2. [项目2] 10mins

    1.准备工作 M层:生成虚假数据 from django.db import models from faker import Factory # Create your models here. c ...

  3. cogs:1619. [HEOI2012]采花/luogu P2056

    1619. [HEOI2012]采花 ★★☆   输入文件:1flower.in   输出文件:1flower.out   简单对比时间限制:5 s   内存限制:128 MB [题目描述] 萧薰儿是 ...

  4. python 学习分享-函数篇2

    递归 自己玩自己的函数: 1. 必须有一个明确的结束条件 2. 每次进入更深一层递归时,问题规模相比上次递归都应有所减少 3. 递归效率不高,递归层次过多会导致栈溢出 递归例子和二分查找都放在里面了 ...

  5. psql 工具详细使用介绍

    psql 介绍 psql 是 PostgreSQL 中的一个命令行交互式客户端工具, 它允许你交互地键入 SQL 命令,然后把它们发送给 PostgreSQL 服务器,再显示 SQL 或命令的结果. ...

  6. c#中dynamic ExpandoObject的用法

    using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...

  7. JavaWeb笔记(一)JDBC

    基本步骤 导入MySQL驱动jar包 mysql-connector-java-8.0.15.zip 注册驱动 获取数据库连接对象Connection 定义sql 获取执行sql语句的对象Statem ...

  8. web自动化测试:watir+minitest(一)

    基本介绍: 本课程的测试环境和工具为:win7+ruby+watir+minitest Watir 全称是"Web Application Testing in Ruby".它是一 ...

  9. Codeforces 1038F Wrap Around (Codeforces Round #508 (Div. 2) F) 题解

    写在前面 此题数据量居然才出到\(n=40\)???(黑人问号)以下给出一个串长\(|S|=100,n=10^9\)的题解. 题目描述 给一个长度不超过\(m\)的01串S,问有多少个长度不超过\(n ...

  10. Windows地址空间

    虚拟地址空间 ​ 当处理器读取或写入存储器位置时,它使用虚拟地址.作为读或写操作的一部分,处理器将虚拟地址转换为物理地址.通过虚拟地址访问内存具有以下优势: 程序可以使用连续范围的虚拟地址来访问在物理 ...