SLIQ/SPRINT

*/-->

SLIQ/SPRINT

Before SLIQ, most classification alogrithms have the problem that they do not scale. Because these alogrithms have the limit that the traning data should fit in memory. That's why SLIQ was raised.

1 Generic Decision-Tree Classification

Most decision-tree classifiers perform classification in two phases: Tree Building and Tree Pruning.

1.1 Tree Building

MakeTree(Training Data T)
Partition(T); Partition(Data S)
if(all points in S are in the same class) then return;
Evaluate splits for each attribute A
Use the best split found to partition S into S1 and S2
Partition(S1);
Partition(S2);

1.2 Tree Pruning

As we have known, no matter how your preprocess works, there always exist "noise" data or other bad data. So, when we use the traning data to build the decision-tree classification, it also create branches for thos bad data. These branches can lead to errors when classifying test data. Tree pruning is aimed at removing these braches from decision tree by selecting the subtree with the least estimated error rate.

2 Scalability Issues

2.1 Tree Building

As I mentioned, ID3/C4.5/Gini1 is used to evaluate the "goodness" of the alternative splits for an attribute.

2.1.1 Splits for Numeric Attribute

The cost of evaluating splits for a numeric attribute is dominated by the cost of sorting the values. Therefore, an important scalability issue is the reduction of sorting costs for numeric attributes.

2.1.2 Splits for Categorical Attribute

2.2 Tree Pruning

3 SLIQ Classifier

To achieve this pre-sorting, we use the following data structures. We create a separate list for each attribute of the training data. Additionally, a separate list,called class list , is created for the class labels attached to the examples. An entry in an attribute list has two fields: one contains an attribute value, the other anindex into the class list. An entry of the class list also has two fields: one contains a class label, the other a reference to a leaf node of the decision tree. The i th entry of the class list corresponds to the i th example in the training data. Each leaf node of the decision tree represents a partition of the training data, the partition being defined by the conjunction of the predicates on the path from the node to the root. Thus, the class list can at any time identify the partition to which an example belongs. We assume that there is enough memory to keep the class list memory-resident. Attribute lists are written to disk if necessary.

Author: mlhy

Created: 2015-10-08 四 21:29

Emacs 24.5.1 (Org mode 8.2.10)

SLIQ/SPRINT的更多相关文章

  1. TFS 2015 敏捷开发实践 – 在Kanban上运行一个Sprint

    前言:在 上一篇 TFS2015敏捷开发实践 中,我们给大家介绍了TFS2015中看板的基本使用和功能,这一篇中我们来看一个具体的场景,如何使用看板来运行一个sprint.Sprint是Scrum对迭 ...

  2. Sprint计划

    团队: 郭志豪:http://www.cnblogs.com/gzh13692021053/ 杨子健:http://www.cnblogs.com/yzj666/ 刘森松:http://www.cnb ...

  3. 计应152第六组Sprint计划会议

    Sprint计划会议 会议时间:2016年12月8下午16:00 会议地点:宿舍 会议进程 • 首先我们讨论了排球计分规则程序完成需要做的一些工作:程序的初期设计,数据分析,典型用户,场景,代码的编写 ...

  4. HOW TO RUN A SPRINT PLANNING MEETING (THE WAY I LIKE IT)

    This is a sample agenda for a sprint planning meeting. Depending on your context you will have to ch ...

  5. Sprint

    Sprint冲刺 1.选题 <寿司点餐系统> 2.app名 <Sushi> 3.团名 ZEG 4.目标 制作一个成型的人性化的寿司点餐系统,介绍各种寿司的材料做法吃法以及价格, ...

  6. sprint 3 总结

    1.要求: 演示可参考毕业设计答辩,包含两部分内容: 项目陈述,可综述项目.团队.开发过程等. 运行演示,实现的功能.业务.用户反馈等. 希望各组认真准备,拿出最好的阵容最好的状态,展示一学期的学习与 ...

  7. [课程设计]Sprint Three 回顾与总结&发表评论&团队贡献分

    Sprint Three 回顾与总结&发表评论&团队贡献分 ● 一.回顾与总结 (1)回顾 燃尽图: Sprint计划-流程图: milestones完成情况如下: (2)总结 本次冲 ...

  8. TFS二次开发系列:八、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(二)

    上一篇文章我们编写了此例的DTO层,本文将数据访问层封装为逻辑层,提供给界面使用. 1.获取TFS Dto实例,并且可以获取项目集合,以及单独获取某个项目实体 public static TFSSer ...

  9. TFS二次开发系列:七、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(一)

    在TFS二次开发中,我们可能会根据某一些情况对各个项目的PBI.BUG等工作项进行统计.在本文中将大略讲解如果进行这些数据统计. 一:连接TFS服务器,并且得到之后需要使用到的类方法. /// < ...

随机推荐

  1. Asp.NET CORE安装部署

    先安装IIS再安装这两个,不然后面各种bug HTTP 错误 500.19 代码 0x8007000d 解决方案 for win7_64 asp.net core IIS Web Core 1.比如最 ...

  2. VMware-Workstation-Full-12.5.9

    https://download3.vmware.com/software/wkst/file/VMware-Workstation-Full-12.5.9-7535481.x86_64.bundle ...

  3. Redhat7 开机启动服务

    #!/bin/sh ### BEGIN INIT INFO # Provides: jboss # Required-Start: $local_fs $remote_fs $network $sys ...

  4. Linux--shell 计算时间差

    参考:https://www.cnblogs.com/leixingzhi7/p/6281675.html starttime=`date +'%Y-%m-%d %H:%M:%S'` #执行程序 en ...

  5. 84.常用的返回QuerySet对象的方法使用详解:select_related, prefetch_related

    1.select_related: 只能用在一对多或者是一对一的关联模型之间,不能用在多对多或者是多对一的关联模型间,比如可以提前获取文章的作者,但是不能通过作者获取作者的文章,或者是通过某篇文章获取 ...

  6. 爬虫防止浏览器防止debug处理

    方式一(基于你会前端,我比较喜欢这种方式) #复制html页面 #复制其中的js,css(css可有可无,如果加css和不加css情况不一样,网页可能做了css反爬处理) #全局搜索debug or ...

  7. 使用IDEA搭建SSM,小白教程

    本片博客主要是搭建一个简单的SSM框架,感兴趣的同学可以看一下 搭建ssm框架首先我们需要有一个数据库,本篇文章博主将使用一个MySQL的数据,如果没学过MySQL数据库的,学过其他数据库也是可以的 ...

  8. goweb-模板引擎

    模板引擎 Go 为我们提供了 text/template 库和 html/template 库这两个模板引擎,模板引 擎通过将数据和模板组合在一起生成最终的 HTML,而处理器负责调用模板引擎并将引 ...

  9. dp--背包--开心的金明

    题目描述 金明今天很开心,家里购置的新房就要领钥匙了,新房里有一间他自己专用的很宽敞的房间.更让他高兴的是,妈妈昨天对他说:“你的房间需要购买哪些物品,怎么布置,你说了算,只要不超过N元钱就行”.今天 ...

  10. java 用阻塞队列实现生产者消费者

    package com.lb; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.Blocking ...