Gradient Boosting的一般算法流程

  1. 初始化: \(f_0(x) = \mathop{\arg\min}\limits_\gamma \sum\limits_{i=1}^N L(y_i, \gamma)\)

  2. for m=1 to M:

    (a) 计算负梯度: \(\tilde{y}_i = -\frac{\partial L(y_i,f_{m-1}(x_i))}{\partial f_{m-1}(x_i)}, \qquad i = 1,2 \cdots N\)

    (b) 通过最小化平方误差,用基学习器\(h_m(x)\)拟合\(\tilde{y_i}\),\(w_m = \mathop{\arg\min}\limits_w \sum\limits_{i=1}^{N} \left[\tilde{y}_i - h_m(x_i\,;\,w) \right]^2\)

    (c) 使用line search确定步长\(\rho_m\),以使L最小,\(\rho_m = \mathop{\arg\min}\limits_{\rho} \sum\limits_{i=1}^{N} L(y_i,f_{m-1}(x_i) + \rho h_m(x_i\,;\,w_m))\)

    (d) \(f_m(x) = f_{m-1}(x) + \rho_m h_m(x\,;\,w_m)\)

  3. 输出\(f_M(x)\)

  • 另外具体实现了early_stopping,回归,分类和分步预测 (stage_predict,见完整代码)。

  • Gradient Boostig一般有一个初始值存在,即上面第一步中的\(f_0(x)\),在实现的时候这个初始值是不能乘学习率的,因为乘的话等于变相改变了初始值,会产生一些意想不到的结果 (很不幸我就犯过这个错误 ~) 。

# 先定义各类损失函数,回归有squared loss、huber loss;分类有logistic loss,modified huber loss
def SquaredLoss_NegGradient(y_pred, y):
return y - y_pred def Huberloss_NegGradient(y_pred, y, alpha):
diff = y - y_pred
delta = stats.scoreatpercentile(np.abs(diff), alpha * 100)
g = np.where(np.abs(diff) > delta, delta * np.sign(diff), diff)
return g def logistic(p):
return 1 / (1 + np.exp(-2 * p)) def LogisticLoss_NegGradient(y_pred, y):
g = 2 * y / (1 + np.exp(1 + 2 * y * y_pred)) # logistic_loss = log(1+exp(-2*y*y_pred))
return g def modified_huber(p):
return (np.clip(p, -1, 1) + 1) / 2 def Modified_Huber_NegGradient(y_pred, y):
margin = y * y_pred
g = np.where(margin >= 1, 0, np.where(margin >= -1, y * 2 * (1-margin), 4 * y))
# modified_huber_loss = np.where(margin >= -1, max(0, (1-margin)^2), -4 * margin)
return g class GradientBoosting(object):
def __init__(self, M, base_learner, learning_rate=1.0, method="regression", tol=None, subsample=None,
loss="square", alpha=0.9):
self.M = M
self.base_learner = base_learner
self.learning_rate = learning_rate
self.method = method
self.tol = tol
self.subsample = subsample
self.loss = loss
self.alpha = alpha def fit(self, X, y):
# tol为early_stopping的阈值,如果使用early_stopping,则从训练集中分出验证集
if self.tol is not None:
X, X_val, y, y_val = train_test_split(X, y, random_state=2)
former_loss = float("inf")
count = 0
tol_init = self.tol init_learner = self.base_learner
y_pred = init_learner.fit(X, y).predict(X) # 初始值
self.base_learner_total = [init_learner]
for m in range(self.M): if self.subsample is not None: # subsample
sample = [np.random.choice(len(X), int(self.subsample * len(X)), replace=False)]
X_s, y_s, y_pred_s = X[sample], y[sample], y_pred[sample]
else:
X_s, y_s, y_pred_s = X, y, y_pred # 计算负梯度
if self.method == "regression":
if self.loss == "square":
response = SquaredLoss_NegGradient(y_pred_s, y_s)
elif self.loss == "huber":
response = Huberloss_NegGradient(y_pred_s, y_s, self.alpha)
elif self.method == "classification":
if self.loss == "logistic":
response = LogisticLoss_NegGradient(y_pred_s, y_s)
elif self.loss == "modified_huber":
response = Modified_Huber_NegGradient(y_pred_s, y_s) base_learner = clone(self.base_learner)
y_pred += base_learner.fit(X_s, response).predict(X) * self.learning_rate
self.base_learner_total.append(base_learner) '''early stopping'''
if m % 10 == 0 and m > 300 and self.tol is not None:
p = np.array([self.base_learner_total[m].predict(X_val) for m in range(1, m+1)])
p = np.vstack((self.base_learner_total[0].predict(X_val), p))
stage_pred = np.sum(p, axis=0)
if self.method == "regression":
later_loss = np.sqrt(mean_squared_error(stage_pred, y_val))
if self.method == "classification":
stage_pred = np.where(logistic(stage_pred) >= 0.5, 1, -1)
later_loss = zero_one_loss(stage_pred, y_val) if later_loss > (former_loss + self.tol):
count += 1
self.tol = self.tol / 2
print(self.tol)
else:
count = 0
self.tol = tol_init if count == 2:
self.M = m - 20
print("early stopping in round {}, best round is {}, M = {}".format(m, m - 20, self.M))
break
former_loss = later_loss return self def predict(self, X):
pred = np.array([self.base_learner_total[m].predict(X) * self.learning_rate for m in range(1, self.M + 1)])
pred = np.vstack((self.base_learner_total[0].predict(X), pred)) # 初始值 + 各基学习器
if self.method == "regression":
pred_final = np.sum(pred, axis=0)
elif self.method == "classification":
if self.loss == "modified_huber":
p = np.sum(pred, axis=0)
pred_final = np.where(modified_huber(p) >= 0.5, 1, -1)
elif self.loss == "logistic":
p = np.sum(pred, axis=0)
pred_final = np.where(logistic(p) >= 0.5, 1, -1)
return pred_final class GBRegression(GradientBoosting):
def __init__(self, M, base_learner, learning_rate, method="regression", loss="square",tol=None, subsample=None, alpha=0.9):
super(GBRegression, self).__init__(M=M, base_learner=base_learner, learning_rate=learning_rate, method=method,
loss=loss, tol=tol, subsample=subsample, alpha=alpha) class GBClassification(GradientBoosting):
def __init__(self, M, base_learner, learning_rate, method="classification", loss="logistic", tol=None, subsample=None):
super(GBClassification, self).__init__(M=M, base_learner=base_learner, learning_rate=learning_rate, method=method,
loss=loss, tol=tol, subsample=subsample) if __name__ == "__main__":
# 创建数据集进行测试
X, y = datasets.make_regression(n_samples=20000, n_features=10, n_informative=4, noise=1.1, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
model = GBRegression(M=1000, base_learner=DecisionTreeRegressor(max_depth=2, random_state=1), learning_rate=0.1,
loss="huber")
model.fit(X_train, y_train)
pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, pred))
print('RMSE: ', rmse) X, y = datasets.make_classification(n_samples=20000, n_features=10, n_informative=4, flip_y=0.1,
n_clusters_per_class=1, n_classes=2, random_state=1)
y[y==0] = -1
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = GBClassification(M=1000, base_learner=DecisionTreeRegressor(max_depth=1, random_state=1), learning_rate=1.0,
method="classification", loss="logistic")
model.fit(X_train, y_train)
pred = model.predict(X_test)
acc = np.zeros(pred.shape)
acc[np.where(pred == y_test)] = 1
accuracy = np.sum(acc) / len(pred)
print('accuracy logistic score: ', accuracy) model = GBClassification(M=1000, base_learner=DecisionTreeRegressor(max_depth=1, random_state=1), learning_rate=1.0,
method="classification", loss="modified_huber")
model.fit(X_train, y_train)
pred = model.predict(X_test)
acc = np.zeros(pred.shape)
acc[np.where(pred == y_test)] = 1
accuracy = np.sum(acc) / len(pred)
print('accuracy modified_huber score: ', accuracy)

输出结果:

RMSE:  8.454462867923157
accuracy logistic score: 0.9434
accuracy modified_huber score: 0.9402

回归:

X, y = datasets.make_regression(n_samples=20000, n_features=20, n_informative=10, noise=100, random_state=1)  # 数据集

下图比较了回归问题中使用平方损失和Huber损失的差别以及各自的early stopping point:

分类:

在分类问题中将上一篇中的 AdaBoost 和本篇中的GBDT作比较,仍使用之前的数据集,其中GBDT分别使用了logistic loss和 这篇文章 最后提到的modified huber loss:

下面换一个噪音较大的数据集,用PCA降到二维进行可视化:

X, y = datasets.make_classification(n_samples=20000, n_features=10, n_informative=4, flip_y=0.3, n_clusters_per_class=1, n_classes=2, random_state=1)

这一次modified loss比logistic loss表现好,但都不如Real AdaBoost。

/

集成学习之Boosting —— Gradient Boosting实现的更多相关文章

  1. 集成学习之Boosting —— Gradient Boosting原理

    集成学习之Boosting -- AdaBoost原理 集成学习之Boosting -- AdaBoost实现 集成学习之Boosting -- Gradient Boosting原理 集成学习之Bo ...

  2. [白话解析] 通俗解析集成学习之bagging,boosting & 随机森林

    [白话解析] 通俗解析集成学习之bagging,boosting & 随机森林 0x00 摘要 本文将尽量使用通俗易懂的方式,尽可能不涉及数学公式,而是从整体的思路上来看,运用感性直觉的思考来 ...

  3. 回归树|GBDT|Gradient Boosting|Gradient Boosting Classifier

    已经好久没写了,正好最近需要做分享所以上来写两篇,这篇是关于决策树的,下一篇是填之前SVM的坑的. 参考文献: http://stats.stackexchange.com/questions/545 ...

  4. 集成学习二: Boosting

    目录 集成学习二: Boosting 引言 Adaboost Adaboost 算法 前向分步算法 前向分步算法 Boosting Tree 回归树 提升回归树 Gradient Boosting 参 ...

  5. 集成学习之Boosting —— XGBoost

    集成学习之Boosting -- AdaBoost 集成学习之Boosting -- Gradient Boosting 集成学习之Boosting -- XGBoost Gradient Boost ...

  6. 机器学习——集成学习(Bagging、Boosting、Stacking)

    1 前言 集成学习的思想是将若干个学习器(分类器&回归器)组合之后产生一个新学习器.弱分类器(weak learner)指那些分类准确率只稍微好于随机猜测的分类器(errorrate < ...

  7. Ensemble Learning 之 Gradient Boosting 与 GBDT

    之前一篇写了关于基于权重的 Boosting 方法 Adaboost,本文主要讲述 Boosting 的另一种形式 Gradient Boosting ,在 Adaboost 中样本权重随着分类正确与 ...

  8. Gradient boosting

    Gradient boosting gradient boosting 是一种boosting(组合弱学习器得到强学习器)算法中的一种,可以把学习算法(logistic regression,deci ...

  9. Gradient Boosting算法简介

    最近项目中涉及基于Gradient Boosting Regression 算法拟合时间序列曲线的内容,利用python机器学习包 scikit-learn 中的GradientBoostingReg ...

随机推荐

  1. OPENSSL编程 (secure shell, ssh)

    很好的 OPENSSL编程 教程,名字就叫“OPENSSL编程” 它里面还有很多关于密码学的东西. http://www.pengshuo.me http://www.pengshuo.me/2014 ...

  2. MySql创建函数与过程,触发器, shell脚本与sql的相互调用。

    一:函数 1:创建数据库和表deptartment, mysql> use DBSC; Database changed mysql), ), )); Query OK, rows affect ...

  3. [py][mx]django实现课程机构排名

    如果是第一次做这个玩意,说实话,确实不知道怎么弄, 做一次后就有感觉了 此前我们已经完成了: 分类筛选 分页 这次我们做的是 课程机构排名 知识点: - 按照点击数从大到小排名, 取出前三名 hot_ ...

  4. 了解Flask 信号机制

    Flask框架中的信号基于blinker,其主要就是让开发者可是在flask请求过程中定制一些用户行为. pip3 install blinker 1. 内置信号 request_started = ...

  5. 5.2 Components — Defining A Component

    一.概述 1. 为了定义一个组件,创建一个模板,它的名字以components/开头.为了定义一个新组件{{blog-post}},例如,创建一个components/blog-post模板. 2.注 ...

  6. 特定于类的内存管理---《C++必知必会》 条款36

    我们可以量身定制 operator new 和 operator delete 用于某个类类型,而不是必须使用标准版的 operator new 和 operator delete. 注意:我们不可以 ...

  7. 154. Find Minimum in Rotated Sorted Array II(剑指offer)

    Follow up for "Find Minimum in Rotated Sorted Array":What if duplicates are allowed? Would ...

  8. Fms3中client端与server端交互方式汇总

    系列文章导航 Flex,Fms3相关文章索引 Flex和Fms3打造在线聊天室(利用NetConnection对象和SharedObject对象) Fms3和Flex打造在线视频录制和回放 Fms3和 ...

  9. Tcp/Ip:Telnet指令

    作用: 1,客户端连接服务端,并对服务端操作: (此功能已逐渐废弃,代替他的远程桌面): 2,telnet ip地址 端口号   用来测试Ip地址下,端口号是否可以被访问

  10. Vue学习笔记之Vue指令系统介绍

    所谓指令系统,大家可以联想咱们的cmd命令行工具,只要我输入一条正确的指令,系统就开始干活了. 在vue中,指令系统,设置一些命令之后,来操作我们的数据属性,并展示到我们的DOM上. OK,接下来我们 ...