Decision Tree

Decision Tree builds classification or regression models in the form of a tree structure. It break down dataset into smaller and smaller subsets while an associated decision tree in incrementally developed at the same time.

Decision Tree learning use top-down recursive method. The basic idea is to construct one tree with a fastest declines of information entropy, the entropy value of all instance in each leaf nodes is zero. Each internal node of the tree corresponding to an attribute, and each leaf node corresponding to a class label.
Advantages:

Decision is easy to explain. It results in a set of rules. It is the same approach as humans generally follow while making decisions.
Interpretation of a complex Decision Tree can be simplified into visualization.It can be understood by everyone.
It almost have no hyper-parameter.

Infomation Gain

The entropy is:
By the information entropy, we can calculate their Experience entropy:

where:
we can also calculate their Experience conditions entropy:
By the information entropy, we can calculate their information gain：
Information gain ratio:
Gini index:

For binary classification:

For binary classification and on the condition of feature A:

Three Building Algorithm

ID3: maximizing information gain
C4.5: maximizing the ratio of information gain
CART
- Regression Tree: minimizing the square error.
- Classification Tree: minimizing the Gini index.

Decision Tree Algorithm Pseudocode

Place the best attribute of the dataset at the root of tree.The way to the selection of best attribute is shown in Three Building Algorithm above.
Split the train set into subset by the best attribute.
Repeat Step 1 and Step 2 on each subset until you find leaf nodes in all the branches of the tree.

Random Forest

Random Forest classifiers work around that limitation by creating a whole bunch of decision trees(hence 'forest'), each trained on random subsets of training samples(bagging, drawn with replacement) and features(drawn without replacement).Make the decision tree work together to get result.
In one word, it build on CART with randomness.

Randomness 1:train the tree on the subsets of train set selected by bagging(sampling with replacement).
Randomness 2:train the tree on the subsets of features(sampling without replacement). For example, select 10 features from 100 features in dataset.
Randomness 3:add new feature by low-dimensional projection.

后记

装逼想用英文写博客，想借此锻炼自己的写作能力，无情打脸(￣ε(#￣)

Ref:https://clyyuanzi.gitbooks.io/julymlnotes/content/rf.html
http://www.saedsayad.com/decision_tree.htm
http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
统计学习方法（李航）

Decision Tree的更多相关文章

Spark MLlib - Decision Tree源码分析
http://spark.apache.org/docs/latest/mllib-decision-tree.html 以决策树作为开始,因为简单,而且也比较容易用到,当前的boosting或ran ...
决策树Decision Tree 及实现
Decision Tree 及实现标签: 决策树熵信息增益分类有监督 2014-03-17 12:12 15010人阅读评论(41) 收藏举报分类: Data Mining(25) Pyt ...
Gradient Boosting Decision Tree学习
Gradient Boosting Decision Tree,即梯度提升树,简称GBDT,也叫GBRT(Gradient Boosting Regression Tree),也称为Multiple ...
使用Decision Tree对MNIST数据集进行实验
使用的Decision Tree中,对MNIST中的灰度值进行了0/1处理,方便来进行分类和计算熵. 使用较少的测试数据测试了在对灰度值进行多分类的情况下,分类结果的正确率如何.实验结果如下. #Te ...
Sklearn库例子1：Sklearn库中AdaBoost和Decision Tree运行结果的比较
DisCrete Versus Real AdaBoost 关于Discrete 和Real AdaBoost 可以参考博客:http://www.cnblogs.com/jcchen1987/p/4 ...
用于分类的决策树(Decision Tree)-ID3 C4.5
决策树(Decision Tree)是一种基本的分类与回归方法(ID3.C4.5和基于 Gini 的 CART 可用于分类,CART还可用于回归).决策树在分类过程中,表示的是基于特征对实例进行划分, ...
OpenCV码源笔记——Decision Tree决策树
来自OpenCV2.3.1 sample/c/mushroom.cpp 1.首先读入agaricus-lepiota.data的训练样本. 样本中第一项是e或p代表有毒或无毒的标志位:其他是特征,可以 ...
GBDT(Gradient Boosting Decision Tree)算法&协同过滤算法
GBDT(Gradient Boosting Decision Tree)算法参考:http://blog.csdn.net/dark_scope/article/details/24863289 理 ...
Gradient Boost Decision Tree(&Treelink)
http://www.cnblogs.com/joneswood/archive/2012/03/04/2379615.html 1. 什么是Treelink Treelink是阿里集团内部 ...
(转)Decision Tree
Decision Tree:Analysis 大家有没有玩过猜猜看(Twenty Questions)的游戏?我在心里想一件物体,你可以用一些问题来确定我心里想的这个物体:如是不是植物?是否会飞?能游 ...

随机推荐

【基础练习】【线性DP】codevs1576 最长严格上升子序列题解
连题目都不放了,就是标题中说的那样.裸题于是直接上代码暑假要来了好好学习 --炉火照天地,红星乱紫烟. 赧郎明月夜.歌曲动寒川.
linux服务器集群重复批量操作脚本实现
http://blog.csdn.net/flyinmind/article/details/8074863 在服务器集群的维护中,经常会遇到同样的操作重复执行很多遍的情况,“登录服务器->做 ...
百度UEditor上传图片-再再总结一次
本周,CSDN有个网友遇到了百度UEditor上传问题,最后商定付50元钱,我帮他解决这个问题. 他最初想自己搞定这个问题,结果搞了好多次,好几天,还是没能解决. 2015年1月17日8: ...
hello.c内核模块编译 -- linux内核
Linux开发模块,在本机上看调试信息的方法走通了.当前版本号2.6.32-32-generic uname –r 能够查询这里取module_param()作为样例. 该宏被定义在include/ ...
PL/SQL一个简短的引论
前言文本 PL/SQL (Procedure Language,程序语言)SQL 1999主要的数据库供应商提供结构化的共同语言 PL/SQL只有支持Oracle数据库基本的语法多行凝视 ...
一个git pull无法使用的问题
之前通过git管理的一个项目,今天直接用eclipse通过ssh加入工程后,每次通过git命令行pull代码都报以下错误: Unable to negotiate with 21.12.1.167 p ...
C++技术问题总结-第0篇类型转换
从今天開始,对C++经常使用技术做个总结. 參考书籍:<C++Primer>.<C++对象模型>.<设计模式>.<Windows核心编程>.<ST ...
链接hdf5库出现错误的解决办法
作者:朱金灿来源:http://blog.csdn.net/clever101 在链接hdf5库出现一些链接错误: error LNK2001: 无法解析的外部符号 _H5T_NATIVE_DOUB ...
mybatis如何实现分页功能?
1)原始方法,使用limit,需要自己处理分页逻辑: 对于mysql数据库可以使用limit,如: select * from table limit 5,10; --返回6-15行对于oracle ...
win10 uwp 使用 asp dotnet core 做图床服务器客户端
原文 win10 uwp 使用 asp dotnet core 做图床服务器客户端本文告诉大家如何在 UWP 做客户端和 asp dotnet core 做服务器端来做一个图床工具服务器端从 ...