assuming that you're using xgboost to fit boosted trees for binary classification. The importance matrix is actually a data.table object with the first column listing the names of all the features actually used in the boosted trees.

The meaning of the importance data table is as follows:

  1. The Gain implies the relative contribution of the corresponding feature to the model calculated by taking each feature's contribution for each tree in the model. A higher value of this metric when compared to another feature implies it is more important for generating a prediction.
  2. The Cover metric means the relative number of observations related to this feature. For example, if you have 100 observations, 4 features and 3 trees, and suppose feature1 is used to decide the leaf node for 10, 5, and 2 observations in tree1, tree2 and tree3 respectively; then the metric will count cover for this feature as 10+5+2 = 17 observations. This will be calculated for all the 4 features and the cover will be 17 expressed as a percentage for all features' cover metrics.
  3. The Frequence (frequency) is the percentage representing the relative number of times a particular feature occurs in the trees of the model. In the above example, if feature1 occurred in 2 splits, 1 split and 3 splits in each of tree1, tree2 and tree3; then the weightage for feature1 will be 2+1+3 = 6. The frequency for feature1 is calculated as its percentage weight over weights of all features.

The Gain is the most relevant attribute to interpret the relative importance of each feature.

Gain is the improvement in accuracy brought by a feature to the branches it is on. The idea is that before adding a new split on a feature X to the branch there was some wrongly classified elements, after adding the split on this feature, there are two new branches, and each of these branch is more accurate (one branch saying if your observation is on this branch then it should be classified as 1, and the other branch saying the exact opposite).

Cover measures the relative quantity of observations concerned by a feature.

Frequency is a simpler way to measure the Gain. It just counts the number of times a feature is used in all generated trees. You should not use it (unless you know why you want to use it).

xgboost 里边的gain freq, cover的更多相关文章

  1. 【原创】xgboost 特征评分的计算原理

    xgboost是基于GBDT原理进行改进的算法,效率高,并且可以进行并行化运算: 而且可以在训练的过程中给出各个特征的评分,从而表明每个特征对模型训练的重要性, 调用的源码就不准备详述,本文主要侧重的 ...

  2. 小巧玲珑:机器学习届快刀XGBoost的介绍和使用

    欢迎大家前往腾讯云技术社区,获取更多腾讯海量技术实践干货哦~ 作者:张萌 序言 XGBoost效率很高,在Kaggle等诸多比赛中使用广泛,并且取得了不少好成绩.为了让公司的算法工程师,可以更加方便的 ...

  3. R语言︱XGBoost极端梯度上升以及forecastxgb(预测)+xgboost(回归)双案例解读

    XGBoost不仅仅可以用来做分类还可以做时间序列方面的预测,而且已经有人做的很好,可以见最后的案例. 应用一:XGBoost用来做预测 ------------------------------- ...

  4. XGBoost类库使用小结

    在XGBoost算法原理小结中,我们讨论了XGBoost的算法原理,这一片我们讨论如何使用XGBoost的Python类库,以及一些重要参数的意义和调参思路. 本文主要参考了XGBoost的Pytho ...

  5. 大白话5分钟带你走进人工智能-第32节集成学习之最通俗理解XGBoost原理和过程

    目录 1.回顾: 1.1 有监督学习中的相关概念 1.2  回归树概念 1.3 树的优点 2.怎么训练模型: 2.1 案例引入 2.2 XGBoost目标函数求解 3.XGBoost中正则项的显式表达 ...

  6. XGBboost 特征评分的计算原理

    xgboost是基于GBDT原理进行改进的算法,效率高,并且可以进行并行化运算,而且可以在训练的过程中给出各个特征的评分,从而表明每个特征对模型训练的重要性, 调用的源码就不准备详述,本文主要侧重的是 ...

  7. XGB算法梳理

    学习内容: 1.CART树 2.算法原理 3.损失函数 4.分裂结点算法 5.正则化 6.对缺失值处理 7.优缺点 8.应用场景 9.sklearn参数 1.CART树 CART算法是一种二分递归分割 ...

  8. XGBoost、LightGBM的详细对比介绍

    sklearn集成方法 集成方法的目的是结合一些基于某些算法训练得到的基学习器来改进其泛化能力和鲁棒性(相对单个的基学习器而言)主流的两种做法分别是: bagging 基本思想 独立的训练一些基学习器 ...

  9. xgboost的sklearn接口和原生接口参数详细说明及调参指点

    from xgboost import XGBClassifier XGBClassifier(max_depth=3,learning_rate=0.1,n_estimators=100,silen ...

随机推荐

  1. AT 指令和常见错误码

    一. 一般命令 1. AT+CGMI 给出模块厂商的标识. 2. AT+CGMM 获得模块标识.这个命令用来得到支持的频带(GSM 900,DCS 1800 或PCS 1900).当模块有多频带时,回 ...

  2. HDOJ5875(线段树)

    Function Time Limit: 7000/3500 MS (Java/Others)    Memory Limit: 262144/262144 K (Java/Others)Total ...

  3. postman断言的方法

    1.在test添加断言 2.检查response的body中是否包含字符串: tests["Body matches string"] = responseBody.has(&qu ...

  4. Java上传下载excel、解析Excel、生成Excel

    在软件开发过程中难免需要批量上传与下载,生成报表保存也是常有之事,最近集团门户开发用到了Excel模版下载,Excel生成,圆满完成,对这一知识点进行整理,资源共享,有不足之处还望批评指正,文章结尾提 ...

  5. 【UVA】536 Tree Recovery(树型结构基础)

    题目 题目     分析 莫名A了     代码 #include <bits/stdc++.h> using namespace std; string s1,s2; void buil ...

  6. 安卓权限处理 PermissionDog

    PermissionDog 简介 权限狗 权限申请 最近在一家公司实习,项目中需要用到适配安卓6.0以上的系统,我本来是想用其他人已经写好的权限申请框架来实现的,但是发现跟我的需求有点小区别,所以就自 ...

  7. OD 实验(二) - 绕过序列号验证

    需要破解的程序 输入用户名和序列号,点击 Check,程序会进行校验 用 OD 打开程序 按快捷键 Ctrl+F9 跟随表达式 GetDlgItemTextA 点击 ok 在这里调用了 GetDlgI ...

  8. Memcache线上常见问题(缓存雪崩、缓存无底洞、永久数据被踢)

    缓存雪崩现象 一般是由于某个节点失效,导致其它节点的缓存命中率下降,缓存中缺失的数据直接去数据库查询,短时间内造成数据库服务器崩溃. 或者是由于缓存周期性失效,比如设置每隔6个小时失效一次,那么每6个 ...

  9. AJAX跨域调用ASP.NET MVC的问题及解决方案

    AJAX跨域调用ASP.NET MVC的问题及解决方案 问题描述: 解决方法: 只需要在web.config中添加如下标为红色的内容即可: <system.webServer> <h ...

  10. DirectShow的RTP发包(H264)Filter <转>

    转帖地址:http://blog.csdn.net/fan2273/article/details/77653700 DirectShow的RTP发包(H264)Filter 基于DirectShow ...