This is the second post in Boosting algorithm. In the previous post, we go through the earliest Boosting algorithm - AdaBoost, which is actually an approximation of exponential loss via additive stage-forward modelling. What if we want to choose othe…
In the previous post we addressed some issue of decision tree, including instability, lack of smoothness, sensitivity to data, and etc. One solution is Boosting Method. In simple words Boosting combines multiple weak learners to get a powerful predic…
After talking about Information theory, now let's come to one of its application - Decision Tree! Nowadays, in terms of prediction power, there are many ensemble methods based on tree that can beat Decision Tree generally. However I found it necessar…
Boost是集成学习方法中的代表思想之一,核心的思想是不断的迭代.boost通常采用改变训练数据的概率分布,针对不同的训练数据分布调用弱学习算法学习一组弱分类器.在多次迭代的过程中,当前次迭代所用的训练数据的概率分布会依据上一次迭代的结果而调整.也就是说训练数据的各样本是有权重的,这个权重本身也会随着迭代而调整.Adaboost(后面补一篇介绍这个的文章吧)在迭代的过程中通过不断调整数据分布的权重来达到提高性能的目的,GBM(Gradient Boosting Machine)则是在迭代的过程中…
原文地址:Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python by Aarshay Jain 原文翻译与校对:@酒酒Angie(drmr_anki@qq.com) && 寒小阳(hanxiaoyang.ml@gmail.com) 时间:2016年9月. 出处:http://blog.csdn.net/han_xiaoyang/article/details/52663170 1.前言 如果一直以来你…
一.GBM参数 总的来说GBM的参数可以被归为三类: 树参数:调节模型中每个决策树的性质 Boosting参数:调节模型中boosting的操作 其他模型参数:调节模型总体的各项运作 1.树参数 现在我们看一看定义一个决策树所需要的参数.注意我在这里用的都是python里scikit-learn里面的术语,和其他软件比如R里用到的可能不同,但原理都是相同的. min_ samples_split  定义了树中一个节点所需要用来分裂的最少样本数. 可以避免过度拟合(over-fitting).如果…
https://statweb.stanford.edu/~jhf/ftp/trebst.pdf page10 90% to 95% of the observations were often deleted without sacrificing accuracy of theestimates,using either influence measure. [解释regularization] page125 Regularization In prediction problems,fi…
A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning by Jason Brownlee on September 9, 2016 in XGBoost 0 0 0 0   Gradient boosting is one of the most powerful techniques for building predictive models. In this post you will d…
Gradient Boosting Decision Tree,即梯度提升树,简称GBDT,也叫GBRT(Gradient Boosting Regression Tree),也称为Multiple Additive Regression Tree(MART),阿里貌似叫treelink. 首先学习GBDT要有决策树的先验知识. Gradient Boosting Decision Tree,和随机森林(random forest)算法一样,也是通过组合弱学习器来形成一个强学习器.GBDT的发明…
How to Configure the Gradient Boosting Algorithm by Jason Brownlee on September 12, 2016 in XGBoost 0 0 0 0   Gradient boosting is one of the most powerful techniques for applied machine learning and as such is quickly becoming one of the most popula…