SLIQ/SPRINT

Before SLIQ, most classification alogrithms have the problem that they do not scale. Because these alogrithms have the limit that the traning data should fit in memory. That's why SLIQ was raised.

1 Generic Decision-Tree Classification

Most decision-tree classifiers perform classification in two phases: Tree Building and Tree Pruning.

1.1 Tree Building

MakeTree(Training Data T)

   Partition(T);

Partition(Data S)

   if(all points in S are in the same class) then return;

   Evaluate splits for each attribute A

   Use the best split found to partition S into S1 and S2

   Partition(S1);

   Partition(S2);

1.2 Tree Pruning

As we have known, no matter how your preprocess works, there always exist "noise" data or other bad data. So, when we use the traning data to build the decision-tree classification, it also create branches for thos bad data. These branches can lead to errors when classifying test data. Tree pruning is aimed at removing these braches from decision tree by selecting the subtree with the least estimated error rate.

2 Scalability Issues

2.1 Tree Building

As I mentioned, ID3/C4.5/Gini¹ is used to evaluate the "goodness" of the alternative splits for an attribute.

2.1.1 Splits for Numeric Attribute

The cost of evaluating splits for a numeric attribute is dominated by the cost of sorting the values. Therefore, an important scalability issue is the reduction of sorting costs for numeric attributes.

2.1.2 Splits for Categorical Attribute

2.2 Tree Pruning

3 SLIQ Classifier

To achieve this pre-sorting, we use the following data structures. We create a separate list for each attribute of the training data. Additionally, a separate list,called class list , is created for the class labels attached to the examples. An entry in an attribute list has two fields: one contains an attribute value, the other anindex into the class list. An entry of the class list also has two fields: one contains a class label, the other a reference to a leaf node of the decision tree. The i th entry of the class list corresponds to the i th example in the training data. Each leaf node of the decision tree represents a partition of the training data, the partition being defined by the conjunction of the predicates on the path from the node to the root. Thus, the class list can at any time identify the partition to which an example belongs. We assume that there is enough memory to keep the class list memory-resident. Attribute lists are written to disk if necessary.

Footnotes:

:http://www.cnblogs.com/mlhy/p/4856062.html

Author: mlhy

Created: 2015-10-08 四 21:29

Emacs 24.5.1 (Org mode 8.2.10)

SLIQ/SPRINT的更多相关文章

TFS 2015 敏捷开发实践 – 在Kanban上运行一个Sprint
前言:在上一篇 TFS2015敏捷开发实践中,我们给大家介绍了TFS2015中看板的基本使用和功能,这一篇中我们来看一个具体的场景,如何使用看板来运行一个sprint.Sprint是Scrum对迭 ...
Sprint计划
团队: 郭志豪:http://www.cnblogs.com/gzh13692021053/ 杨子健:http://www.cnblogs.com/yzj666/ 刘森松:http://www.cnb ...
计应152第六组Sprint计划会议
Sprint计划会议会议时间:2016年12月8下午16:00 会议地点:宿舍会议进程 • 首先我们讨论了排球计分规则程序完成需要做的一些工作:程序的初期设计,数据分析,典型用户,场景,代码的编写 ...
HOW TO RUN A SPRINT PLANNING MEETING (THE WAY I LIKE IT)
This is a sample agenda for a sprint planning meeting. Depending on your context you will have to ch ...
Sprint
Sprint冲刺 1.选题 <寿司点餐系统> 2.app名 <Sushi> 3.团名 ZEG 4.目标制作一个成型的人性化的寿司点餐系统,介绍各种寿司的材料做法吃法以及价格, ...
sprint 3 总结
1.要求: 演示可参考毕业设计答辩,包含两部分内容: 项目陈述,可综述项目.团队.开发过程等. 运行演示,实现的功能.业务.用户反馈等. 希望各组认真准备,拿出最好的阵容最好的状态,展示一学期的学习与 ...
[课程设计]Sprint Three 回顾与总结&发表评论&团队贡献分
Sprint Three 回顾与总结&发表评论&团队贡献分 ● 一.回顾与总结 (1)回顾燃尽图: Sprint计划-流程图: milestones完成情况如下: (2)总结本次冲 ...
TFS二次开发系列：八、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(二)
上一篇文章我们编写了此例的DTO层,本文将数据访问层封装为逻辑层,提供给界面使用. 1.获取TFS Dto实例,并且可以获取项目集合,以及单独获取某个项目实体 public static TFSSer ...
TFS二次开发系列：七、TFS二次开发的数据统计以PBI、Bug、Sprint等为例（一）
在TFS二次开发中,我们可能会根据某一些情况对各个项目的PBI.BUG等工作项进行统计.在本文中将大略讲解如果进行这些数据统计. 一:连接TFS服务器,并且得到之后需要使用到的类方法. /// < ...

随机推荐

Java 使用控制台操作实现数据库的增删改查
使用控制台进行数据库增删改查操作,首先创建一个Java Bean类,实现基础数据的构造,Get,Set方法的实现,减少代码重复性. 基本属性为学生学号 Id, 学生姓名 Name,学生性别 Sex, ...
NSDictionary和NSMaptable， NSArray，NSSet，NSOrderedSet和NSHashTable的区别
NSSet, NSDictionary, NSArray是Foundation框架关于集合操作的常用类, 和其他标准的集合操作库不同, 他们的实现方法对开发者进行隐藏, 只允许开发者写一些简单的代码, ...
UVA 11987 - Almost Union-Find 并查集的活用 id化查找
受教了,感谢玉斌大神的博客. 这道题最难的地方就是操作2,将一个集合中的一个点单独移到另一个集合,因为并查集的性质,如果该点本身作为root节点的话,怎么保证其他点不受影响. 玉斌大神的思路很厉害,受 ...
HDU_2256 矩阵快速幂需推算
最近开始由线段树转移新的内容,线段树学到扫描线这里有点迷迷糊糊的,有时候放一放可能会好一些. 最近突然对各种数学问题很感兴趣.好好钻研了一下矩阵快速幂.发现矩阵真是个计算神器,累乘类的运算原本要O(N ...
Java之创建线程的方式三：实现Callable接口
import java.util.concurrent.Callable;import java.util.concurrent.ExecutionException;import java.util ...
mybatis连接mysql（jdbc）常见问题
问题1:java.io.IOException: Could not find resource com/xxx/xxxMapper.xml IDEA是不会编译src的java目录的xml文件,所以在 ...
awk中传参方式
结合编辑数据文件的shell脚本学习awk传参方式,该脚本功能: a.取VIDEOUSR_11082017_0102_ONLINE_STASTIC.dat文件中第87个字段的低8位: b.将每行数据的 ...
吴裕雄--天生自然 JAVA开发学习：修饰符
public class InstanceCounter { private static int numInstances = 0; protected static int getCount() ...
SaltStack中状态间关系unless、onlyif、require、require_in、watch、watch_in
1.unless 检查的命令,仅当unless选项指向的命令返回值为false时才执行name定义的命令 cmd.run: {% "] %} - name: 'nohup sh /alida ...
SpringBoot2中，怎么生成静态文档
SpringBoot2中,怎么生成静态文档在实际开发过程中,我们通过swagger就可以生成我们的接口文档,这个文档就可以提供给前端人员开发使用的.但是,有时候,我们需要把我们的接口文档,提供给第三 ...

SLIQ/SPRINT

SLIQ/SPRINT

1 Generic Decision-Tree Classification

1.1 Tree Building

1.2 Tree Pruning

2 Scalability Issues

2.1 Tree Building

2.1.1 Splits for Numeric Attribute

2.1.2 Splits for Categorical Attribute

2.2 Tree Pruning

3 SLIQ Classifier

Footnotes:

SLIQ/SPRINT的更多相关文章

随机推荐

热门专题