[Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.
The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.
When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.
The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)
[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- [Poj3281]Dining(最大流)
Description 有n头牛,f种食物,d种饮料,每头牛有nf种喜欢的食物,nd种喜欢的饮料,每种食物如果给一头牛吃了,那么另一个牛就不能吃这种食物了,饮料也同理,问最多有多少头牛可以吃到它喜欢的 ...
- Scrapy的架构与原理的理解【转】
Scrapy 框架 Scrapy是用纯Python实现一个为了爬取网站数据.提取结构性数据而编写的应用框架,用途非常广泛. 框架的力量,用户只需要定制开发几个模块就可以轻松的实现一个爬虫,用来抓取网页 ...
- 使用Visual Studio建立报表--C#
原文:使用Visual Studio建立报表--C# 版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/qq_23893313/article/deta ...
- Maya
建立酒杯的方法(CV曲线) surface(曲面)-- creat cv curve tool-- control vertex(调整图形)[再次creat cv建立厚度,只需要建立酒杯的上口]--- ...
- Marketing learning-1
Today we start to learn something about marketing together.Sometimes i just propose a question,and i ...
- windows下使用grunt
grunt官网:http://www.gruntjs.org/ 一.安装grunt 先安装node,在http://www.nodejs.org/可以下载安装包直接安装.在命令行下运行: npm in ...
- linux命令之grep、cut
输入: ifconfig eth0 eth0表示主机的第一块网卡. 输出: eth0: flags=<UP,BROADCAST,RUNNING,MULTICAST> mtu inet 19 ...
- HDU5857 Median 模拟
Median HDU - 5857 There is a sorted sequence A of length n. Give you m queries, each one contains fo ...
- linux系统——软链接、硬链接
区别:硬链接原文件&链接文件公用一个inode号,说明他们是同一个文件,而软链接原文件&链接文件拥有不同的inode号,表明他们是两个不同的文件: 在文件属性上软链接明确写出了是链接文 ...
- jackson 的UnrecognizedPropertyException错误
阅读更多 前段时间,使用jackson封装了json字符串转换为javabean的方法,代码如下: public static <T> T renderJson2Object(String ...