[Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.
The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.
When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.
The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)
[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- spark测试几个hadoop的典型例子
1.求每年的最高温度数据格式如下: 0067011990999991950051507004888888889999999N9+00001+999999999999999999999900670119 ...
- CQRS之旅——旅程5(准备发布V1版本)
旅程5:准备发布V1版本 添加功能和重构,为V1版本发布做准备. "大多数人在完成一件事之后,就像留声机的唱片一样,一遍又一遍地使用它,直到它破碎,忘记了过去是用来创造更多未来的东西.&qu ...
- mysql 分组查询前n条数据
今天去面试,碰到一道面试题: 有一个学生成绩表,表中有 表id.学生名.学科.分数.学生id .查询每科学习最好的两名学生的信息: 建表sql: CREATE TABLE `stuscore` ( ` ...
- django 自定义过滤器中的坑.
今天在创建自定义过滤器的时候,设置已正常.但是在运行后报: 'filter' is not a valid tag library: Template library filter not found ...
- 做一个日收入100元的APP!
[导语]虽然讲了很多个人开发者的文章,但新手开发者如何赚自己的第一个100块钱,确是最难的事情.群里有人说都不知道干什么 app赚钱,完全没有想法, 并且经常问我有什么快速赚钱的方法.我只能遗憾地说, ...
- MFC深入浅出读书笔记第一部分
最近看侯捷的MFC深入浅出,简单总结一下. 第一章首先就是先了解一下windows程序设计的基础知识,包括win32程序开发基础,什么*.lib,*.h,*.cpp的,程序入口点WinMain函数,窗 ...
- RSA进阶之共模攻击
适用场景: 同一个n,对相同的m进行了加密,e取值不一样. e1和e2互质,gcd(e1,e2)=1 如果满足上述条件,那么就可以在不分解n的情况下求解m 原理 复杂的东西简单说: 如果gcd(e1, ...
- java基础-容器
已经写了一段时间JAVA代码了,但仔细想来,却发现对JAVA的很多方面还是一片迷茫. 利用周末补一下基础知识吧. 大致列一下这个周末需要学习的内容 1 容器 (本节内容) 2 线程 3 流 目录 1 ...
- Spring AOP Example 文件下载:
文件下载:http://files.cnblogs.com/wucg/spring_aop_excise.zip P:124 spring核心技术 P225: spring doc 可以把Advi ...
- 201621123034 《Java程序设计》第1周学习总结
1. 本周学习总结 知道了java的用途有安卓手机应用,企业服务器后端,java web.学到了新概念:类.HelloWorld.java 中 HelloWorld 是主文件名,区分 .java和 . ...