[Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.
The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.
When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.
The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)
[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- HDOJ 2120 Ice_cream's world I
Ice_cream's world I ice_cream's world is a rich country, it has many fertile lands. Today, the queen ...
- 笔记-python-standard library-12.1 pickle
笔记-python-standard library-12.1 pickle 1. pickle简介 source code: Lib/pickle.py pickle模块实质上是一个实现p ...
- python 闯关之路四(下)(并发编程与数据库编程) 并发编程重点
python 闯关之路四(下)(并发编程与数据库编程) 并发编程重点: 1 2 3 4 5 6 7 并发编程:线程.进程.队列.IO多路模型 操作系统工作原理介绍.线程.进程演化史.特点.区别 ...
- mysql中为int设置长度究竟是什么意思
根据个人的实验并结合资料:1.长度跟可以使用的值的范围无关,值的范围仅跟类型对应的存储字节数和是否unsigned有关:2.长度指的是显示宽度,比如,指定3位int,那么id为3和id为300的值,在 ...
- Java web学习总结
javaweb学习总结(十四)——JSP原理 孤傲苍狼 2014-07-24 09:38 阅读:46603 评论:37 JavaWeb学习总结(十三)——使用Session防止表单重复提交 孤 ...
- 1、IOS学习计划
2015年12月10日 -- 2015年12月27日(一共3个周末,12个个工作日) 1.斯坦福公开课(IOS7应用开发) 一共18节课程,通过视频和demo建立感觉 2.千峰的OC课程 一共25节课 ...
- MoveWindow() SetWindowPos()的区别与联系
敲代码时,突然发现有一个背景图片无法显示,百思不得其解,最终发现是MoveWindow() SetWindowPos()这两个函数的使用不当造成的. 这里把这两个函数的前世今生给分析一下. 先看Mov ...
- python安装pattern失败
做文本分类需要用到pattern.en进行词形还原,安装了一上午都没有成功,mysqlclient安装失败.最后解决办法,pip install --only-binary :all: mysqlcl ...
- Leetcode 498.对角线遍历
对角线遍历 给定一个含有 M x N 个元素的矩阵(M 行,N 列),请以对角线遍历的顺序返回这个矩阵中的所有元素,对角线遍历如下图所示. 示例: 输入: [ [ 1, 2, 3 ], [ 4, 5, ...
- 利用python爬取58同城简历数据
利用python爬取58同城简历数据 利用python爬取58同城简历数据 最近接到一个工作,需要获取58同城上面的简历信息(http://gz.58.com/qzyewu/).最开始想到是用pyth ...