Decision Tree builds classification or regression models in the form of a tree structure. It break down dataset into smaller and smaller subsets while an associated decision tree in incrementally developed at the same time.

Decision Tree learning use top-down recursive method. The basic idea is to construct one tree with a fastest declines of information entropy, the entropy value of all instance in each leaf nodes is zero. Each internal node of the tree corresponding to an attribute, and each leaf node corresponding to a class label.
Advantages:

  • Decision is easy to explain. It results in a set of rules. It is the same approach as humans generally follow while making decisions.
  • Interpretation of a complex Decision Tree can be simplified into visualization.It can be understood by everyone.
  • It almost have no hyper-parameter.

Infomation Gain

  • The entropy is:
  • By the information entropy, we can calculate their Experience entropy:

    where:
  • we can also calculate their Experience conditions entropy:
  • By the information entropy, we can calculate their information gain:
  • Information gain ratio:
  • Gini index:

    For binary classification:

    For binary classification and on the condition of feature A:

Three Building Algorithm

  • ID3: maximizing information gain
  • C4.5: maximizing the ratio of information gain
  • CART
    • Regression Tree: minimizing the square error.
    • Classification Tree: minimizing the Gini index.

Decision Tree Algorithm Pseudocode

  • Place the best attribute of the dataset at the root of tree.The way to the selection of best attribute is shown in Three Building Algorithm above.
  • Split the train set into subset by the best attribute.
  • Repeat Step 1 and Step 2 on each subset until you find leaf nodes in all the branches of the tree.

Random Forest

Random Forest classifiers work around that limitation by creating a whole bunch of decision trees(hence 'forest'), each trained on random subsets of training samples(bagging, drawn with replacement) and features(drawn without replacement).Make the decision tree work together to get result.
In one word, it build on CART with randomness.

  • Randomness 1:train the tree on the subsets of train set selected by bagging(sampling with replacement).

  • Randomness 2:train the tree on the subsets of features(sampling without replacement). For example, select 10 features from 100 features in dataset.
  • Randomness 3:add new feature by low-dimensional projection.

后记

装逼想用英文写博客,想借此锻炼自己的写作能力,无情打脸( ̄ε(# ̄)

Ref:https://clyyuanzi.gitbooks.io/julymlnotes/content/rf.html
http://www.saedsayad.com/decision_tree.htm
http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
统计学习方法(李航)

Decision Tree的更多相关文章

  1. Spark MLlib - Decision Tree源码分析

    http://spark.apache.org/docs/latest/mllib-decision-tree.html 以决策树作为开始,因为简单,而且也比较容易用到,当前的boosting或ran ...

  2. 决策树Decision Tree 及实现

    Decision Tree 及实现 标签: 决策树熵信息增益分类有监督 2014-03-17 12:12 15010人阅读 评论(41) 收藏 举报  分类: Data Mining(25)  Pyt ...

  3. Gradient Boosting Decision Tree学习

    Gradient Boosting Decision Tree,即梯度提升树,简称GBDT,也叫GBRT(Gradient Boosting Regression Tree),也称为Multiple ...

  4. 使用Decision Tree对MNIST数据集进行实验

    使用的Decision Tree中,对MNIST中的灰度值进行了0/1处理,方便来进行分类和计算熵. 使用较少的测试数据测试了在对灰度值进行多分类的情况下,分类结果的正确率如何.实验结果如下. #Te ...

  5. Sklearn库例子1:Sklearn库中AdaBoost和Decision Tree运行结果的比较

    DisCrete Versus Real AdaBoost 关于Discrete 和Real AdaBoost 可以参考博客:http://www.cnblogs.com/jcchen1987/p/4 ...

  6. 用于分类的决策树(Decision Tree)-ID3 C4.5

    决策树(Decision Tree)是一种基本的分类与回归方法(ID3.C4.5和基于 Gini 的 CART 可用于分类,CART还可用于回归).决策树在分类过程中,表示的是基于特征对实例进行划分, ...

  7. OpenCV码源笔记——Decision Tree决策树

    来自OpenCV2.3.1 sample/c/mushroom.cpp 1.首先读入agaricus-lepiota.data的训练样本. 样本中第一项是e或p代表有毒或无毒的标志位:其他是特征,可以 ...

  8. GBDT(Gradient Boosting Decision Tree)算法&协同过滤算法

    GBDT(Gradient Boosting Decision Tree)算法参考:http://blog.csdn.net/dark_scope/article/details/24863289 理 ...

  9. Gradient Boost Decision Tree(&Treelink)

    http://www.cnblogs.com/joneswood/archive/2012/03/04/2379615.html 1.      什么是Treelink Treelink是阿里集团内部 ...

  10. (转)Decision Tree

    Decision Tree:Analysis 大家有没有玩过猜猜看(Twenty Questions)的游戏?我在心里想一件物体,你可以用一些问题来确定我心里想的这个物体:如是不是植物?是否会飞?能游 ...

随机推荐

  1. Unity 2D游戏开发高速入门第1章创建一个简单的2D游戏

    Unity 2D游戏开发高速入门第1章创建一个简单的2D游戏 即使是如今,非常多初学游戏开发的同学.在谈到Unity的时候.依旧会觉得Unity仅仅能用于制作3D游戏的. 实际上.Unity在2013 ...

  2. MapReduce 经典案例手机流量排序的分析

    在进行流量排序之前,先要明白排序是发生在map阶段,排序之后(排序结束后map阶段才会显示100%完成)才会到reduce阶段(事实上reduce也会排序),.此外排序之前要已经完成了手机流量的统计工 ...

  3. SpringBoot项目优化和Jvm调优(楼主亲测,真实有效)

    项目调优 作为一名工程师,项目调优这事,是必须得熟练掌握的事情. 在SpringBoot项目中,调优主要通过配置文件和配置JVM的参数的方式进行. 在这边有一篇比较好的文章,推荐给大家! Spring ...

  4. Swift 中的Closures(闭包)详解

    Swift 中的Closures(闭包)详解 在Swift没有发布之前,所有人使用OC语言编写Cocoa上的程序,而其中经常被人们讨论的其中之一 -- Block 一直备受大家的喜爱.在Swift中, ...

  5. vuex与vue-router学习方案

    1.vuex,官方vuex2.0的文档写得太简略了,先1.0的文档,研究1.0分支的counter例子.1.0文档只需看核心概念和API参考文档.2.0的用法先不管,需要的时候再说,先把1.0高熟练. ...

  6. wxWidgets初学者导引(2)——下载、安装wxWidgets

    wxWidgets初学者导引全目录   PDF版及附件下载 1 前言2 下载.安装wxWidgets3 wxWidgets应用程序初体验4 wxWidgets学习资料及利用方法指导5 用wxSmith ...

  7. wpf mvvm模式下CommandParameter传递多参

    原文:wpf mvvm模式下CommandParameter传递多参 CommandParameter一般只允许设置一次,所以如果要传递多参数,就要稍微处理一下.我暂时还没找到更好的方案,下面介绍的这 ...

  8. mongdb aggregate 聚合数据

    最近用到的一些mongodb的数据查询方法 及api用法 Aggregate() 数据聚合处理的方法 可以将聚合的一些方法放在其后面的括号中,也可继续以agg.的样式链式加入 aggregate.al ...

  9. servlet-显示器

    1.什么是监听器 监听器是实现一个特定的接口java规划,该计划的目的是还调用类方法监听器.java的awt大量使用该模式,如的能力button点击事件.当鼠标点击时,就会调用事件处理程序.又如:在j ...

  10. ocjp(scjp) 的官网样题收录-20130723

    官网上给的样题很少,带(*)的为正确答案. OBJECTIVE: 1.5: Given a code example, determine if a method is correctly overr ...