Decision Tree
Decision Tree builds classification or regression models in the form of a tree structure. It break down dataset into smaller and smaller subsets while an associated decision tree in incrementally developed at the same time.
Decision Tree learning use top-down recursive method. The basic idea is to construct one tree with a fastest declines of information entropy, the entropy value of all instance in each leaf nodes is zero. Each internal node of the tree corresponding to an attribute, and each leaf node corresponding to a class label.
Advantages:
- Decision is easy to explain. It results in a set of rules. It is the same approach as humans generally follow while making decisions.
- Interpretation of a complex Decision Tree can be simplified into visualization.It can be understood by everyone.
- It almost have no hyper-parameter.
Infomation Gain
- The entropy is:
- By the information entropy, we can calculate their Experience entropy:
where:
- we can also calculate their Experience conditions entropy:
- By the information entropy, we can calculate their information gain:
- Information gain ratio:
- Gini index:
For binary classification:
For binary classification and on the condition of feature A:
Three Building Algorithm
- ID3: maximizing information gain
- C4.5: maximizing the ratio of information gain
- CART
- Regression Tree: minimizing the square error.
- Classification Tree: minimizing the Gini index.
Decision Tree Algorithm Pseudocode
- Place the best attribute of the dataset at the root of tree.The way to the selection of best attribute is shown in Three Building Algorithm above.
- Split the train set into subset by the best attribute.
- Repeat Step 1 and Step 2 on each subset until you find leaf nodes in all the branches of the tree.
Random Forest
Random Forest classifiers work around that limitation by creating a whole bunch of decision trees(hence 'forest'), each trained on random subsets of training samples
(bagging
, drawn with replacement) and features
(drawn without replacement).Make the decision tree work together to get result.
In one word, it build on CART with randomness.
Randomness 1:train the tree on the subsets of train set selected by
bagging
(sampling with replacement).
- Randomness 2:train the tree on the subsets of features(sampling without replacement). For example, select 10 features from 100 features in dataset.
Randomness 3:add new feature by low-dimensional projection.
后记
装逼想用英文写博客,想借此锻炼自己的写作能力,无情打脸( ̄ε(# ̄)
Ref:https://clyyuanzi.gitbooks.io/julymlnotes/content/rf.html
http://www.saedsayad.com/decision_tree.htm
http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
统计学习方法(李航)
Decision Tree的更多相关文章
- Spark MLlib - Decision Tree源码分析
http://spark.apache.org/docs/latest/mllib-decision-tree.html 以决策树作为开始,因为简单,而且也比较容易用到,当前的boosting或ran ...
- 决策树Decision Tree 及实现
Decision Tree 及实现 标签: 决策树熵信息增益分类有监督 2014-03-17 12:12 15010人阅读 评论(41) 收藏 举报 分类: Data Mining(25) Pyt ...
- Gradient Boosting Decision Tree学习
Gradient Boosting Decision Tree,即梯度提升树,简称GBDT,也叫GBRT(Gradient Boosting Regression Tree),也称为Multiple ...
- 使用Decision Tree对MNIST数据集进行实验
使用的Decision Tree中,对MNIST中的灰度值进行了0/1处理,方便来进行分类和计算熵. 使用较少的测试数据测试了在对灰度值进行多分类的情况下,分类结果的正确率如何.实验结果如下. #Te ...
- Sklearn库例子1:Sklearn库中AdaBoost和Decision Tree运行结果的比较
DisCrete Versus Real AdaBoost 关于Discrete 和Real AdaBoost 可以参考博客:http://www.cnblogs.com/jcchen1987/p/4 ...
- 用于分类的决策树(Decision Tree)-ID3 C4.5
决策树(Decision Tree)是一种基本的分类与回归方法(ID3.C4.5和基于 Gini 的 CART 可用于分类,CART还可用于回归).决策树在分类过程中,表示的是基于特征对实例进行划分, ...
- OpenCV码源笔记——Decision Tree决策树
来自OpenCV2.3.1 sample/c/mushroom.cpp 1.首先读入agaricus-lepiota.data的训练样本. 样本中第一项是e或p代表有毒或无毒的标志位:其他是特征,可以 ...
- GBDT(Gradient Boosting Decision Tree)算法&协同过滤算法
GBDT(Gradient Boosting Decision Tree)算法参考:http://blog.csdn.net/dark_scope/article/details/24863289 理 ...
- Gradient Boost Decision Tree(&Treelink)
http://www.cnblogs.com/joneswood/archive/2012/03/04/2379615.html 1. 什么是Treelink Treelink是阿里集团内部 ...
- (转)Decision Tree
Decision Tree:Analysis 大家有没有玩过猜猜看(Twenty Questions)的游戏?我在心里想一件物体,你可以用一些问题来确定我心里想的这个物体:如是不是植物?是否会飞?能游 ...
随机推荐
- 编辑框等控件边框美化(继承CEdit,然后覆盖OnMouseLeave, OnSetFocus, OnPaint函数即可。原来的CEdit虽然代码不可见,但它也是有句柄的,照样随便画)
源码说明:美化能获取焦点控件的边框颜色,获取焦点后颜色不同(类似彗星小助手.QQ等软件),支持自定义颜色,支持单独设置各个控件颜色.实现方法:子类化,在WM_NCPAINT.WM_PAINT等消息自己 ...
- hadoop 3.x 回收站
使用回收站最主要是为了给误删文件的你留条后路 打开core-site.xml添加以下配置 <!--回收站保存文件时间--> <property> <name>fs. ...
- 记录一下go web 文档
https://github.com/astaxie/build-web-application-with-golang
- options.parse === void 0
if (options.parse === void 0) options.parse = true; https://developer.mozilla.org/zh-CN/docs/Web/Jav ...
- 贝叶斯方法(Bayesian approach) —— 一种概率解释(probabilistic interpretation)
1. Bayesian approach 对于多项式拟合问题,我们可通过最小二乘(least squares)的方式计算得到模型的参数,最小二乘法又可视为最大似然(maximum likelihood ...
- Java编程思想里的泛型实现一个堆栈类
觉得作者写得太好了,不得不收藏一下. 对这个例子的理解: //类型参数不能用基本类型,T和U其实是同一类型. //每次放新数据都成为新的top,把原来的top往下压一级,通过指针建立链接. //末端哨 ...
- Cocos2d-x layout (两)
相对于对照布局 Size widgetSize = Director::getInstance()->getWinSize(); Text* alert = Text::create(" ...
- uml系列(七)——互动图
互动图uml描述如何对象的描述在系统交互动作 . 废话不多说,还是来张图: 概念 交互图,主要描写叙述的是系统中的一组对象的消息的传递的.为对象间的交互定义了一个可视的表示方法. 构 ...
- WPF: WrapPanel 容器的数据绑定(动态生成控件、遍历)
原文:WPF: WrapPanel 容器的数据绑定(动态生成控件.遍历) 问题: 有一些CheckBox需要作为选项添加到页面上,但是数目不定.而为了方便排版,我选择用WrapPanel ...
- SICP 1.29-1.33
1.29 (define (sum term a next b) (if (> a b) (+ (term a) (sum term (next a) next b)))) (define (c ...