SLIQ/SPRINT

*/-->

SLIQ/SPRINT

Before SLIQ, most classification alogrithms have the problem that they do not scale. Because these alogrithms have the limit that the traning data should fit in memory. That's why SLIQ was raised.

1 Generic Decision-Tree Classification

Most decision-tree classifiers perform classification in two phases: Tree Building and Tree Pruning.

1.1 Tree Building

MakeTree(Training Data T)
Partition(T); Partition(Data S)
if(all points in S are in the same class) then return;
Evaluate splits for each attribute A
Use the best split found to partition S into S1 and S2
Partition(S1);
Partition(S2);

1.2 Tree Pruning

As we have known, no matter how your preprocess works, there always exist "noise" data or other bad data. So, when we use the traning data to build the decision-tree classification, it also create branches for thos bad data. These branches can lead to errors when classifying test data. Tree pruning is aimed at removing these braches from decision tree by selecting the subtree with the least estimated error rate.

2 Scalability Issues

2.1 Tree Building

As I mentioned, ID3/C4.5/Gini1 is used to evaluate the "goodness" of the alternative splits for an attribute.

2.1.1 Splits for Numeric Attribute

The cost of evaluating splits for a numeric attribute is dominated by the cost of sorting the values. Therefore, an important scalability issue is the reduction of sorting costs for numeric attributes.

2.1.2 Splits for Categorical Attribute

2.2 Tree Pruning

3 SLIQ Classifier

To achieve this pre-sorting, we use the following data structures. We create a separate list for each attribute of the training data. Additionally, a separate list,called class list , is created for the class labels attached to the examples. An entry in an attribute list has two fields: one contains an attribute value, the other anindex into the class list. An entry of the class list also has two fields: one contains a class label, the other a reference to a leaf node of the decision tree. The i th entry of the class list corresponds to the i th example in the training data. Each leaf node of the decision tree represents a partition of the training data, the partition being defined by the conjunction of the predicates on the path from the node to the root. Thus, the class list can at any time identify the partition to which an example belongs. We assume that there is enough memory to keep the class list memory-resident. Attribute lists are written to disk if necessary.

Author: mlhy

Created: 2015-10-08 四 21:29

Emacs 24.5.1 (Org mode 8.2.10)

SLIQ/SPRINT的更多相关文章

  1. TFS 2015 敏捷开发实践 – 在Kanban上运行一个Sprint

    前言:在 上一篇 TFS2015敏捷开发实践 中,我们给大家介绍了TFS2015中看板的基本使用和功能,这一篇中我们来看一个具体的场景,如何使用看板来运行一个sprint.Sprint是Scrum对迭 ...

  2. Sprint计划

    团队: 郭志豪:http://www.cnblogs.com/gzh13692021053/ 杨子健:http://www.cnblogs.com/yzj666/ 刘森松:http://www.cnb ...

  3. 计应152第六组Sprint计划会议

    Sprint计划会议 会议时间:2016年12月8下午16:00 会议地点:宿舍 会议进程 • 首先我们讨论了排球计分规则程序完成需要做的一些工作:程序的初期设计,数据分析,典型用户,场景,代码的编写 ...

  4. HOW TO RUN A SPRINT PLANNING MEETING (THE WAY I LIKE IT)

    This is a sample agenda for a sprint planning meeting. Depending on your context you will have to ch ...

  5. Sprint

    Sprint冲刺 1.选题 <寿司点餐系统> 2.app名 <Sushi> 3.团名 ZEG 4.目标 制作一个成型的人性化的寿司点餐系统,介绍各种寿司的材料做法吃法以及价格, ...

  6. sprint 3 总结

    1.要求: 演示可参考毕业设计答辩,包含两部分内容: 项目陈述,可综述项目.团队.开发过程等. 运行演示,实现的功能.业务.用户反馈等. 希望各组认真准备,拿出最好的阵容最好的状态,展示一学期的学习与 ...

  7. [课程设计]Sprint Three 回顾与总结&发表评论&团队贡献分

    Sprint Three 回顾与总结&发表评论&团队贡献分 ● 一.回顾与总结 (1)回顾 燃尽图: Sprint计划-流程图: milestones完成情况如下: (2)总结 本次冲 ...

  8. TFS二次开发系列:八、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(二)

    上一篇文章我们编写了此例的DTO层,本文将数据访问层封装为逻辑层,提供给界面使用. 1.获取TFS Dto实例,并且可以获取项目集合,以及单独获取某个项目实体 public static TFSSer ...

  9. TFS二次开发系列:七、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(一)

    在TFS二次开发中,我们可能会根据某一些情况对各个项目的PBI.BUG等工作项进行统计.在本文中将大略讲解如果进行这些数据统计. 一:连接TFS服务器,并且得到之后需要使用到的类方法. /// < ...

随机推荐

  1. Javascript object.constructor属性与面向对象编程(oop)

    定义和用法 在 JavaScript 中, constructor 属性返回对象的构造函数. 返回值是函数的引用,不是函数名: JavaScript 数组 constructor 属性返回 funct ...

  2. CSS(3)之 less 和rem

    less 预编译脚本语言. LESS 语法 less语法2 LESS中文 rem rem的适配原理 rem 是相对于页面根源素html的字体大小的一个尺寸单位 页面内容可以使用rem为单位,那么htm ...

  3. JavaScript 之 原型及原型链

    对象[回顾] 通过字面量创建对象 //通过字面量创建对象 var obj1 = { name:'Jack', age: 18 } 通过系统自带的构造函数构造对象 // 通过系统自带的构造函数构造对象 ...

  4. Corporative Network (有n个节点,然后执行I u,v(把u的父节点设为v)和E u(询问u到根节点的距离))并查集

    A very big corporation is developing its corporative network. In the beginning each of the N enterpr ...

  5. shell中sparksql语句调试、执行方式

    1.命令方式执行sparksql查询 SQL="use mydatatable;;select count(1) from tab_videousr_onlne where p_regiio ...

  6. 小结spring给项目开发的好处

    1.spring 抽象了许多开发中遇到的共性问题:支持pojo和javaBean开发使应用面向接口开发.如各种Template 2.Ioc 容器使得对象间的耦合关系文本化.外部化,即通过xml的配置就 ...

  7. h5-拖拽接口

    1.原效果网页 拖拽后: 2.主要实现代码 <div class="div1" id="div1"> <!--在h5中,如果想拖拽元素,久必须 ...

  8. POJ-1703 Find them, Catch them(并查集&数组记录状态)

    题目: The police office in Tadu City decides to say ends to the chaos, as launch actions to root up th ...

  9. windows服务器搭建SVN[多项目设置方法]

    https://tortoisesvn.net/downloads.html 根据系统版本进行下载,下载后正常一路正常安装. 第一.设置版本号仓库目录,比如:cdengine 第二.在cdengine ...

  10. shift+回车,换行。断点。

    在Idea中,shift+回车可以在一行的任意一地方换行. 断点的小知识. debug启动程序后左下角会出现断点的功能选项. 一个竖列 一个横行,没有请求时是灰的. 这里主要讲竖列. 这个是沉默全部断 ...