sklearn 可视化模型的训练测试收敛情况和特征重要性

show the code:

# Plot training deviance

def plot_training_deviance(clf, n_estimators, X_test, y_test):

    # compute test set deviance

    test_score = np.zeros((n_estimators,), dtype=np.float64)

    for i, y_pred in enumerate(clf.staged_predict(X_test)):

        test_score[i] = clf.loss_(y_test, y_pred)

    plt.figure(figsize=(12, 6))

    plt.subplot(1, 2, 1)

    plt.title('Deviance')

    train_score = clf.train_score_

    logging.info("len(train_score): %s" % len(train_score))

    logging.info(train_score)

    logging.info("len(test_score): %s" % len(test_score))

    logging.info(test_score)

    plt.plot(np.arange(n_estimators) + 1, train_score, 'b-',

             label='Training Set Deviance')

    plt.plot(np.arange(n_estimators) + 1, test_score, 'r*', label='Test Set Deviance')

    plt.legend(loc='upper right')

    plt.xlabel('Boosting Iterations')

    plt.ylabel('Deviance')

    plt.show()

# Plot feature importance

def plot_feature_importance(clf, feature_names):

    feature_importance = clf.feature_importances_

    # make importances relative to max importance

    feature_importance = 100.0 * (feature_importance / feature_importance.max())

    sorted_idx = np.argsort(feature_importance)

    pos = np.arange(sorted_idx.shape[0]) + .5

    plt.subplot(1, 2, 2)

    plt.barh(pos, feature_importance[sorted_idx], align='center')

    # plt.yticks(pos, feature_names[sorted_idx])

    plt.yticks(pos, [feature_names[idx] for idx in sorted_idx])

    plt.xlabel('Relative Importance')

    plt.title('Variable Importance')

    plt.show()

class Train(object):

    def __init__(self, data_file):

        self.data_file = data_file

        self.x_fields = ["xxx", "xxx", "xxx"]

        self.x_features, self.y_labels = self.load_data()

    def load_data(self):

        x_features, y_labels = [], []

        # ......

        return x_features, y_labels

    def train_model(self):

        model = GradientBoostingRegressor(random_state=42)

        model.fit(self.x_features, self.y_labels)

        y_pred = model.predict(self.x_features)

        logging.info("mean_squared_error: %.6f" % mean_squared_error(self.y_labels, y_pred))

        logging.info("mean_squared_log_error: %.6f" % mean_squared_log_error(self.y_labels, y_pred))

        plot_training_deviance(clf=model, n_estimators=model.get_params()["n_estimators"], X_test=self.x_features, y_test=self.y_labels)

        # 输出feature重要性

        logging.info("feature_importances_: %s" % model.feature_importances_)

        plot_feature_importance(clf=model, feature_names=self.x_fields)

参考的是sklearn中的样例: Gradient Boosting regression — scikit-learn 0.19.2 documentation

画出的图如下所示：

sklearn 可视化模型的训练测试收敛情况和特征重要性的更多相关文章

【集成学习】sklearn中xgboost模块中plot_importance函数（绘图--特征重要性）
直接上代码,简单 # -*- coding: utf-8 -*- """ ################################################ ...
使用 TensorBoard 可视化模型、数据和训练
使用 TensorBoard 可视化模型.数据和训练在 60 Minutes Blitz 中,我们展示了如何加载数据,并把数据送到我们继承 nn.Module 类的模型,在训练数据上训练模型,并在测 ...
sklearn——train_test_split 随机划分训练集和测试集
sklearn——train_test_split 随机划分训练集和测试集 sklearn.model_selection.train_test_split随机划分训练集和测试集官网文档:http: ...
机器学习使用sklearn进行模型训练、预测和评价
cross_val_score(model_name, x_samples, y_labels, cv=k) 作用:验证某个模型在某个训练集上的稳定性,输出k个预测精度. K折交叉验证(k-fold) ...
pytorch seq2seq模型训练测试
num_sequence.py """ 数字序列化方法 """ class NumSequence: """ ...
学习笔记TF016:CNN实现、数据集、TFRecord、加载图像、模型、训练、调试
AlexNet(Alex Krizhevsky,ILSVRC2012冠军)适合做图像分类.层自左向右.自上向下读取,关联层分为一组,高度.宽度减小,深度增加.深度增加减少网络计算量. 训练模型数据集 ...
Python 3 利用 Dlib 19.7 和 sklearn机器学习模型实现人脸微笑检测
0.引言利用机器学习的方法训练微笑检测模型,给一张人脸照片,判断是否微笑: 使用的数据集中69张没笑脸,65张有笑脸,训练结果识别精度在95%附近: 效果: 图1 示例效果工程利用pytho ...
sklearn保存模型的两种方式
sklearn 中模型保存的两种方法一. sklearn中提供了高效的模型持久化模块joblib,将模型保存至硬盘. from sklearn.externals import joblib # ...
sklearn保存模型-【老鱼学sklearn】
训练好了一个Model 以后总需要保存和再次预测, 所以保存和读取我们的sklearn model也是同样重要的一步. 比如,我们根据房源样本数据训练了一下房价模型,当用户输入自己的房子后,我们就需要 ...

随机推荐

数据库知识，mysql索引原理
1:innodb底层实现原理:https://blog.csdn.net/u012978884/article/details/52416997 2:MySQL索引背后的数据结构及算法原理 ht ...
【译】Using Objects to Organize Your Code
耗了一个晚上吐血翻译不过也学到了不少...<使用对象来组织你的代码>,翻译中发现原作者在原文中有部分代码有误或不全,本文已修改和添加~ 丽贝卡·墨菲原文链接:http://rmurphey ...
glassfish3新建domain
下载路径:http://download.oracle.com/glassfish/3.1.2.2/release/index.html .zip (解压缩)cd /glassfish3/glassf ...
HDU5086:Revenge of Segment Tree(规律题）
http://acm.hdu.edu.cn/showproblem.php?pid=5086 #include <iostream> #include <stdio.h> #i ...
ref out 区别
1.使用ref型参数时,传入的参数必须先被初始化.对out而言,必须在方法中对其完成初始化. 2.使用ref和out时,在方法的参数和执行方法时,都要加Ref或Out关键字.以满足匹配. 3.out适 ...
No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'null' is therefore not allowed access.
一.什么是跨域访问举个栗子:在A网站中,我们希望使用Ajax来获得B网站中的特定内容.如果A网站与B网站不在同一个域中,那么就出现了跨域访问问题.你可以理解为两个域名之间不能跨过域名来发送请求或者请 ...
（10）场景转换（Transitions）
Cocos2d-x最爽的一个特性之一就是提供了在两个不同场景之间直接转换的能力.例如:淡入淡出,放大缩小,旋转,跳动等.从技术上来说,一个场景转换就是在展示并控制一个新场景之前执行一个转换效果. 场景 ...
文件下载—SSH框架文件下载
1.准备下载的api组件 <dependency> <groupId>commons-io</groupId> <artifactId>commons- ...
20155302 2016-2017-2 《Java程序设计》第八周学习总结
20155302 2016-2017-2 <Java程序设计>第八周学习总结教材学习内容总结在NIO中有几个比较关键的概念:Channel(通道),Buffer(缓冲区),Select ...
ng-深度学习-课程笔记-15: 循环序列模型(Week1)
1 数学符号(Notation) $ x^{<1>}, x^{<2>}, ..., x^{<t>}, ..., x^{<q>} $ 表示一段输入序列x, ...

sklearn 可视化模型的训练测试收敛情况和特征重要性

sklearn 可视化模型的训练测试收敛情况和特征重要性的更多相关文章

随机推荐

热门专题