[Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a series of steps in data preparation. Scikit-Learn provides the Pipeline class to help with such sequences of transformations.
The Pipeline constructor takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers (i.e., they must have a fit_transform() method). The names can be anything you like.
When you call the pipeline’s fit() method, it calls fit_transform() sequentially on all transformers, passing the output of each call as the parameter to the next call, until it reaches the final estimator, for which it just calls the fit() method.
The pipeline exposes the same methods as the final estimator. In this example, the last estimator is a StandardScaler, which is a transformer, so the pipeline has a transform() method that applies all the transforms to the data in sequence (it also has a fit_transform method that we could have used instead of calling fit() and then transform()).
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
]) try:
from sklearn.compose import ColumnTransformer
except ImportError:
from future_encoders import ColumnTransformer # Scikit-Learn < 0.20 num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"] full_pipeline = ColumnTransformer([
("num", num_pipeline, num_attribs),
("cat", OneHotEncoder(), cat_attribs),
]) housing_prepared = full_pipeline.fit_transform(housing)
[Machine Learning with Python] Data Preparation through Transformation Pipeline的更多相关文章
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- 子窗体与父窗体调用对方js方法
有时候为了减少一个页面内的代码量,会将部分内容放到子窗体中,如后台管理中用iframe来进行管理 <div> <iframe id="dviframe" src= ...
- Diycode开源项目 BaseApplication分析+LeakCanary第三方+CrashHandler自定义异常处理
1.BaseApplication整个应用的开始 1.1.看一下代码 /* * Copyright 2017 GcsSloop * * Licensed under the Apache Licens ...
- java以正确的方式停止线程
java线程停止可以说是非常有讲究的,看起来非常简单,但是也要做好一些防范措施,一般停止一个线程可以使用Thread.stop();来实现,但是最好不要用,因为他是不安全的. 大多数停止线程使用Thr ...
- Java web学习总结
javaweb学习总结(十四)——JSP原理 孤傲苍狼 2014-07-24 09:38 阅读:46603 评论:37 JavaWeb学习总结(十三)——使用Session防止表单重复提交 孤 ...
- Asp.net页面生命周期详解任我行(3)-服务器处理请求详细过程
前言 百度了一下才知道,传智的邹老师桃李满天下呀,我也是邹老师的粉丝,最开始学习页面生命周期的时候也是看了邹老师的视频. 本人是参考了以下前辈的作品,本文中也参合了本人心得,绝非有意盗版,旨在传播,最 ...
- Asp.net自定义控件开发任我行(3)-Render
摘要 上一篇我们讲到了自定义标签TagPrefix用法,此篇我们来讲一下控件的呈现,主要是呈现下拉框 内容 呈现的方法有,Render,RenderControl,RenderChildren,这三个 ...
- Python+Selenium练习篇之8-利用css定位元素
前面介绍了,XPath, id , class , link text, partial link text, tag name, name 七大元素定位方法,本文介绍webdriver支持的最后一个 ...
- 更改 Mac 上的功能键行为
您可以将 Apple 键盘上的顶行按键用作标准功能键,或用来控制 Mac 的内建功能. 如果您的 Apple 键盘部分顶行按键上印有图标,则这些按键可用于执行每个图标所示的特殊功能.这些按键也可用 ...
- 单元测试如何保证了易用的API
一般而言TDD的好处是以输出为导向及早发现问题,以及方便重构(单元测试保证).我理解,还有一个比较重要的意义是: 客观上强制了程序员写出更加友好的接口 方便测试和联调. 问题 这里我以c++举例,需求 ...
- [oldboy-django][4python面试]有关yield那些事
1 yield 在使用send, next时候的区别(举例m = yield 5) 无论send,next首先理解m = yield 5 是将表达式"yield 5 "的结果返回给 ...