Gradient Boosting的一般算法流程

  1. 初始化: \(f_0(x) = \mathop{\arg\min}\limits_\gamma \sum\limits_{i=1}^N L(y_i, \gamma)\)

  2. for m=1 to M:

    (a) 计算负梯度: \(\tilde{y}_i = -\frac{\partial L(y_i,f_{m-1}(x_i))}{\partial f_{m-1}(x_i)}, \qquad i = 1,2 \cdots N\)

    (b) 通过最小化平方误差,用基学习器\(h_m(x)\)拟合\(\tilde{y_i}\),\(w_m = \mathop{\arg\min}\limits_w \sum\limits_{i=1}^{N} \left[\tilde{y}_i - h_m(x_i\,;\,w) \right]^2\)

    (c) 使用line search确定步长\(\rho_m\),以使L最小,\(\rho_m = \mathop{\arg\min}\limits_{\rho} \sum\limits_{i=1}^{N} L(y_i,f_{m-1}(x_i) + \rho h_m(x_i\,;\,w_m))\)

    (d) \(f_m(x) = f_{m-1}(x) + \rho_m h_m(x\,;\,w_m)\)

  3. 输出\(f_M(x)\)

  • 另外具体实现了early_stopping,回归,分类和分步预测 (stage_predict,见完整代码)。

  • Gradient Boostig一般有一个初始值存在,即上面第一步中的\(f_0(x)\),在实现的时候这个初始值是不能乘学习率的,因为乘的话等于变相改变了初始值,会产生一些意想不到的结果 (很不幸我就犯过这个错误 ~) 。

# 先定义各类损失函数,回归有squared loss、huber loss;分类有logistic loss,modified huber loss
def SquaredLoss_NegGradient(y_pred, y):
return y - y_pred def Huberloss_NegGradient(y_pred, y, alpha):
diff = y - y_pred
delta = stats.scoreatpercentile(np.abs(diff), alpha * 100)
g = np.where(np.abs(diff) > delta, delta * np.sign(diff), diff)
return g def logistic(p):
return 1 / (1 + np.exp(-2 * p)) def LogisticLoss_NegGradient(y_pred, y):
g = 2 * y / (1 + np.exp(1 + 2 * y * y_pred)) # logistic_loss = log(1+exp(-2*y*y_pred))
return g def modified_huber(p):
return (np.clip(p, -1, 1) + 1) / 2 def Modified_Huber_NegGradient(y_pred, y):
margin = y * y_pred
g = np.where(margin >= 1, 0, np.where(margin >= -1, y * 2 * (1-margin), 4 * y))
# modified_huber_loss = np.where(margin >= -1, max(0, (1-margin)^2), -4 * margin)
return g class GradientBoosting(object):
def __init__(self, M, base_learner, learning_rate=1.0, method="regression", tol=None, subsample=None,
loss="square", alpha=0.9):
self.M = M
self.base_learner = base_learner
self.learning_rate = learning_rate
self.method = method
self.tol = tol
self.subsample = subsample
self.loss = loss
self.alpha = alpha def fit(self, X, y):
# tol为early_stopping的阈值,如果使用early_stopping,则从训练集中分出验证集
if self.tol is not None:
X, X_val, y, y_val = train_test_split(X, y, random_state=2)
former_loss = float("inf")
count = 0
tol_init = self.tol init_learner = self.base_learner
y_pred = init_learner.fit(X, y).predict(X) # 初始值
self.base_learner_total = [init_learner]
for m in range(self.M): if self.subsample is not None: # subsample
sample = [np.random.choice(len(X), int(self.subsample * len(X)), replace=False)]
X_s, y_s, y_pred_s = X[sample], y[sample], y_pred[sample]
else:
X_s, y_s, y_pred_s = X, y, y_pred # 计算负梯度
if self.method == "regression":
if self.loss == "square":
response = SquaredLoss_NegGradient(y_pred_s, y_s)
elif self.loss == "huber":
response = Huberloss_NegGradient(y_pred_s, y_s, self.alpha)
elif self.method == "classification":
if self.loss == "logistic":
response = LogisticLoss_NegGradient(y_pred_s, y_s)
elif self.loss == "modified_huber":
response = Modified_Huber_NegGradient(y_pred_s, y_s) base_learner = clone(self.base_learner)
y_pred += base_learner.fit(X_s, response).predict(X) * self.learning_rate
self.base_learner_total.append(base_learner) '''early stopping'''
if m % 10 == 0 and m > 300 and self.tol is not None:
p = np.array([self.base_learner_total[m].predict(X_val) for m in range(1, m+1)])
p = np.vstack((self.base_learner_total[0].predict(X_val), p))
stage_pred = np.sum(p, axis=0)
if self.method == "regression":
later_loss = np.sqrt(mean_squared_error(stage_pred, y_val))
if self.method == "classification":
stage_pred = np.where(logistic(stage_pred) >= 0.5, 1, -1)
later_loss = zero_one_loss(stage_pred, y_val) if later_loss > (former_loss + self.tol):
count += 1
self.tol = self.tol / 2
print(self.tol)
else:
count = 0
self.tol = tol_init if count == 2:
self.M = m - 20
print("early stopping in round {}, best round is {}, M = {}".format(m, m - 20, self.M))
break
former_loss = later_loss return self def predict(self, X):
pred = np.array([self.base_learner_total[m].predict(X) * self.learning_rate for m in range(1, self.M + 1)])
pred = np.vstack((self.base_learner_total[0].predict(X), pred)) # 初始值 + 各基学习器
if self.method == "regression":
pred_final = np.sum(pred, axis=0)
elif self.method == "classification":
if self.loss == "modified_huber":
p = np.sum(pred, axis=0)
pred_final = np.where(modified_huber(p) >= 0.5, 1, -1)
elif self.loss == "logistic":
p = np.sum(pred, axis=0)
pred_final = np.where(logistic(p) >= 0.5, 1, -1)
return pred_final class GBRegression(GradientBoosting):
def __init__(self, M, base_learner, learning_rate, method="regression", loss="square",tol=None, subsample=None, alpha=0.9):
super(GBRegression, self).__init__(M=M, base_learner=base_learner, learning_rate=learning_rate, method=method,
loss=loss, tol=tol, subsample=subsample, alpha=alpha) class GBClassification(GradientBoosting):
def __init__(self, M, base_learner, learning_rate, method="classification", loss="logistic", tol=None, subsample=None):
super(GBClassification, self).__init__(M=M, base_learner=base_learner, learning_rate=learning_rate, method=method,
loss=loss, tol=tol, subsample=subsample) if __name__ == "__main__":
# 创建数据集进行测试
X, y = datasets.make_regression(n_samples=20000, n_features=10, n_informative=4, noise=1.1, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
model = GBRegression(M=1000, base_learner=DecisionTreeRegressor(max_depth=2, random_state=1), learning_rate=0.1,
loss="huber")
model.fit(X_train, y_train)
pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, pred))
print('RMSE: ', rmse) X, y = datasets.make_classification(n_samples=20000, n_features=10, n_informative=4, flip_y=0.1,
n_clusters_per_class=1, n_classes=2, random_state=1)
y[y==0] = -1
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = GBClassification(M=1000, base_learner=DecisionTreeRegressor(max_depth=1, random_state=1), learning_rate=1.0,
method="classification", loss="logistic")
model.fit(X_train, y_train)
pred = model.predict(X_test)
acc = np.zeros(pred.shape)
acc[np.where(pred == y_test)] = 1
accuracy = np.sum(acc) / len(pred)
print('accuracy logistic score: ', accuracy) model = GBClassification(M=1000, base_learner=DecisionTreeRegressor(max_depth=1, random_state=1), learning_rate=1.0,
method="classification", loss="modified_huber")
model.fit(X_train, y_train)
pred = model.predict(X_test)
acc = np.zeros(pred.shape)
acc[np.where(pred == y_test)] = 1
accuracy = np.sum(acc) / len(pred)
print('accuracy modified_huber score: ', accuracy)

输出结果:

RMSE:  8.454462867923157
accuracy logistic score: 0.9434
accuracy modified_huber score: 0.9402

回归:

X, y = datasets.make_regression(n_samples=20000, n_features=20, n_informative=10, noise=100, random_state=1)  # 数据集

下图比较了回归问题中使用平方损失和Huber损失的差别以及各自的early stopping point:

分类:

在分类问题中将上一篇中的 AdaBoost 和本篇中的GBDT作比较,仍使用之前的数据集,其中GBDT分别使用了logistic loss和 这篇文章 最后提到的modified huber loss:

下面换一个噪音较大的数据集,用PCA降到二维进行可视化:

X, y = datasets.make_classification(n_samples=20000, n_features=10, n_informative=4, flip_y=0.3, n_clusters_per_class=1, n_classes=2, random_state=1)

这一次modified loss比logistic loss表现好,但都不如Real AdaBoost。

/

集成学习之Boosting —— Gradient Boosting实现的更多相关文章

  1. 集成学习之Boosting —— Gradient Boosting原理

    集成学习之Boosting -- AdaBoost原理 集成学习之Boosting -- AdaBoost实现 集成学习之Boosting -- Gradient Boosting原理 集成学习之Bo ...

  2. [白话解析] 通俗解析集成学习之bagging,boosting & 随机森林

    [白话解析] 通俗解析集成学习之bagging,boosting & 随机森林 0x00 摘要 本文将尽量使用通俗易懂的方式,尽可能不涉及数学公式,而是从整体的思路上来看,运用感性直觉的思考来 ...

  3. 回归树|GBDT|Gradient Boosting|Gradient Boosting Classifier

    已经好久没写了,正好最近需要做分享所以上来写两篇,这篇是关于决策树的,下一篇是填之前SVM的坑的. 参考文献: http://stats.stackexchange.com/questions/545 ...

  4. 集成学习二: Boosting

    目录 集成学习二: Boosting 引言 Adaboost Adaboost 算法 前向分步算法 前向分步算法 Boosting Tree 回归树 提升回归树 Gradient Boosting 参 ...

  5. 集成学习之Boosting —— XGBoost

    集成学习之Boosting -- AdaBoost 集成学习之Boosting -- Gradient Boosting 集成学习之Boosting -- XGBoost Gradient Boost ...

  6. 机器学习——集成学习(Bagging、Boosting、Stacking)

    1 前言 集成学习的思想是将若干个学习器(分类器&回归器)组合之后产生一个新学习器.弱分类器(weak learner)指那些分类准确率只稍微好于随机猜测的分类器(errorrate < ...

  7. Ensemble Learning 之 Gradient Boosting 与 GBDT

    之前一篇写了关于基于权重的 Boosting 方法 Adaboost,本文主要讲述 Boosting 的另一种形式 Gradient Boosting ,在 Adaboost 中样本权重随着分类正确与 ...

  8. Gradient boosting

    Gradient boosting gradient boosting 是一种boosting(组合弱学习器得到强学习器)算法中的一种,可以把学习算法(logistic regression,deci ...

  9. Gradient Boosting算法简介

    最近项目中涉及基于Gradient Boosting Regression 算法拟合时间序列曲线的内容,利用python机器学习包 scikit-learn 中的GradientBoostingReg ...

随机推荐

  1. day14(编码实战-用户登录注册)

    day14 案例:用户注册登录 要求:3层框架,使用验证码   功能分析 注册 登录   1.1 JSP页面 regist.jsp 注册表单:用户输入注册信息: 回显错误信息:当注册失败时,显示错误信 ...

  2. Java压缩多个文件并导出

    controller层: /** * 打包压缩下载文件 */ @RequestMapping(value = "/downLoadZipFile") public void dow ...

  3. 布局-EasyUI Panel 面板、EasyUI Tabs 标签页/选项卡、EasyUI Accordion 折叠面板、EasyUI Layout 布局

    EasyUI Panel 面板 通过 $.fn.panel.defaults 重写默认的 defaults. 面板(panel)当做其他内容的容器使用.它是创建其他组件(比如:Layout 布局.Ta ...

  4. linux安装Navicat,界面出现乱码解决方法

    下载Navicat:navicat112_mariadb_cs_x64.tar.gz 点击" ./start_navicat"安装出现界面便面为乱码 解决办法:打开start_na ...

  5. 团队 作业6--展示(alpha阶段)

    团队作业6--展示博客(alpha阶段) 一.团队信息 团队码云地址: https://gitee.com/kezhiqing/soft_team_blog 成员介绍: 个人博客地址 团队成员 个人博 ...

  6. jqGrid有关问题 小知识点

    为每一行添加查看按钮  并且   调整行高 gridComplete:function(taskDatas){ //在此事件中循环为每一行添加修改和删除链接 $( ".autotip&quo ...

  7. linux 常用命令总结(二)

    1. linux下以指定的编码打开文件:LANG=zh_CN vi fileName 2. 查看系统内存使用,可以使用free -m 或 top 3. 使用env查看所有环境变量 4. df –h 查 ...

  8. STM32端口输入输出模式配置

    STM32的IO口模式配置 根据数据手册提供的信息,stm32的io口一共有八种模式,他们分别是: 四种输入模式 上拉输入:通过内部的上拉电阻将一个不确定的信号通过一个电阻拉到高电平. 下拉输入:把电 ...

  9. nginx配置文件参数详解

    nginx配置文件主要分为4部分:main(全局设置)    main部分设置的指令将影响其他所有设置server(主机设置)server部分的指令主要用于指定主机和端口upstream(负载均衡服务 ...

  10. Ubuntu 16.04 (官方命令行)安装MongoDB 3.6.2(社区版)

    概述 使用本教程从 .deb 包在LTS Ubuntu Linux系统上安装MongoDB Community Edition. 虽然Ubuntu包含自己的MongoDB包,但官方的MongoDB社区 ...