scikit-learn：6. Strategies to scale computationally: bigger data

參考：http://scikit-learn.org/stable/modules/scaling_strategies.html

对于examples、features（或者两者）数量非常大的情况，挑战传统的方法要解决两个问题：内存和效率。办法是Out-of-core (or “external memory”) learning。

有三种方法能够实现out-of-core。各自是：

1、Streaming instances（流体化实例）：

简单说就是。instances是一个一个来的。详细实现不在scikit-learn文档范围。

2、Extracting features：

简单说就是利用different feature
extraction methods（翻译之后的文章：http://blog.csdn.net/mmc2015/article/details/46992105）实现大数据提取实用数据。简化内存、提高效率。不细讲。

3、Incremental
learning：

all
estimators implementing the partial_fit API
are candidates。

the
ability to learn incrementally from a mini-batch of instances (sometimes called “online learning”) is key to out-of-core learning as it guarantees that at any given time there will be only a small amount of instances in the main memory。

全部实现 partial_fit API
的estimators都能够实现增量学习，包含：

Classification
Regression
- sklearn.linear_model.SGDRegressor
- sklearn.linear_model.PassiveAggressiveRegressor
Clustering
- sklearn.cluster.MiniBatchKMeans
Decomposition / feature Extraction

注意：对于分类问题，因为incremental
learner可能不知道全部的classes有哪些，所以第一次调用partial_fit时，最好人工设定參数 classes= ，指明全部类别。

4、Examples：

a
example of Out-of-core
classification of text documents. 通过样例能够更好理解上面的内容。

scikit-learn：6. Strategies to scale computationally: bigger data的更多相关文章

scikit learn 模块调参 pipeline+girdsearch 数据举例：文档分类（python代码）
scikit learn 模块调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...
(原创)（四）机器学习笔记之Scikit Learn的Logistic回归初探
目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...
(原创)（三）机器学习笔记之Scikit Learn的线性回归模型初探
一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价模型训练好后,度量模型拟合效果的 ...
Scikit Learn: 在python中机器学习
转自:http://my.oschina.net/u/175377/blog/84420#OSC_h2_23 Scikit Learn: 在python中机器学习 Warning 警告:有些没能理解的 ...
偏移：translate ，旋转：rotate，缩放 scale，不知道什么东东：lineCap 实例
<!DOCTYPE HTML> <head> <meta charset = "utf-8"> <title>canvas</ ...
Scikit Learn
Scikit Learn Scikit-Learn简称sklearn,基于 Python 语言的,简单高效的数据挖掘和数据分析工具,建立在 NumPy,SciPy 和 matplotlib 上.
Bigtable：A Distributed Storage System for Strctured Data
2006 年10 月Google 发布三架马车之一的<Bigtable:A Distributed Storage System for Strctured Data>论文之后,Power ...
18.翻译系列：EF 6 Code-First 中的Seed Data（种子数据或原始测试数据）【EF 6 Code-First系列】
原文链接:https://www.entityframeworktutorial.net/code-first/seed-database-in-code-first.aspx EF 6 Code-F ...
Query意图分析：记一次完整的机器学习过程（scikit learn library学习笔记）
所谓学习问题,是指观察由n个样本组成的集合,并根据这些数据来预测未知数据的性质. 学习任务(一个二分类问题): 区分一个普通的互联网检索Query是否具有某个垂直领域的意图.假设现在有一个O2O领域的 ...

随机推荐

RabbitMQ学习笔记2-理解消息通信
消息包含两部分:1.有效载荷(payload) - 你想要传输的数据.2.标签(lable) - 描述有效载荷的相关信息,包含具体的交换器.消息的接受兴趣方等. rabbitmq的基础流程如下: Ra ...
net.sf.json.JSONException: 'object' is an array. Use JSONArray instead
list集合转换JSON出错误意思是:对象"是一个数组. 使用jsonarray取代. 解决方法: 将JSONObject替换为JSONArray 代码: JsonConfig jsonC ...
data1是字符串?需要加上引号
07-22 15:55:29.832: E/AndroidRuntime(23914): FATAL EXCEPTION: main 07-22 15:55:29.832: E/AndroidRunt ...
【php+js】用PHP或者JS怎么显示搜索到的关键字高亮，及其文章里包含关键字的一小段
1.想要实现的效果: 2.思路:小数据量使用 php的正则替换,即[preg_replace()]函数 -->> 支持多个关键词高亮显示,中间参数1和参数2放入对应的数组即可. $titl ...
自定义UITabbarcontrollerview
// 初始化contentView [self initContentView]; #pragma mark 初始化contentView - (void)initContentView { CGSi ...
（原）CosFace/AM-Softmax及其mxnet代码
转载请注明出处: http://www.cnblogs.com/darkknightzh/p/8525241.html 论文: CosFace: Large Margin Cosine Loss fo ...
（原+译）pytorch中保存和载入模型
转载请注明出处: http://www.cnblogs.com/darkknightzh/p/8108466.html 参考网址: http://pytorch.org/docs/master/not ...
（原）tensorflow中提示CUDA_ERROR_LAUNCH_FAILED
转载请注明出处: http://www.cnblogs.com/darkknightzh/p/6606092.html 参考网址: https://github.com/tensorflow/tens ...
转 windows查看端口占用命令
转自 http://www.cnblogs.com/allenblogs/archive/2010/06/25/1765055.html 开始--运行--cmd 进入命令提示符输入netstat ...
sql 2005出现错误:数据库 'Twitter' 的事务日志已满。若要查明无法重用日志中的空间的原因，请参阅 sys.databases 中的 log_reuse_wait_desc 列。
--先备份数据库 --截断事务日志 backup log Twitter with no_loggo --收缩数据库 dbcc shrinkdatabase(Twitter) go O ...

scikit-learn：6. Strategies to scale computationally: bigger data

scikit-learn：6. Strategies to scale computationally: bigger data的更多相关文章

随机推荐

热门专题