XGBoost使用教程（与sklearn一起使用）二

一、导入必要的工具包
# 运行 xgboost安装包中的示例程序
from xgboost import XGBClassifier

# 加载LibSVM格式数据模块
from sklearn.datasets import load_svmlight_file
from sklearn.metrics import accuracy_score

from matplotlib import pyplot
二、数据读取
scikit-learn支持多种格式的数据，包括LibSVM格式数据
XGBoost可以加载libsvm格式的文本数据，libsvm的文件格式（稀疏特征）如下：
1 101:1.2 102:0.03
0 1:2.1 10001:300 10002:400
...
每一行表示一个样本，第一行的开头的“1”是样本的标签。“101”和“102”为特征索引，'1.2'和'0.03' 为特征的值。
在两类分类中，用“1”表示正样本，用“0” 表示负样本。也支持[0,1]表示概率用来做标签，表示为正样本的概率。
下面的示例数据需要我们通过一些蘑菇的若干属性判断这个品种是否有毒。
UCI数据描述：http://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/ ，
每个样本描述了蘑菇的22个属性，比如形状、气味等等(加工成libsvm格式后变成了126维特征)，
然后给出了这个蘑菇是否可食用。其中6513个样本做训练，1611个样本做测试。

数据下载地址：http://download.csdn.net/download/u011630575/10266113

# read in data，数据在xgboost安装的路径下的demo目录,现在copy到代码目录下的data目录
my_workpath = './data/'
X_train,y_train = load_svmlight_file(my_workpath + 'agaricus.txt.train')
X_test,y_test = load_svmlight_file(my_workpath + 'agaricus.txt.test')

print(X_train.shape)
print (X_test.shape)
三、训练参数设置

max_depth：树的最大深度。缺省值为6，取值范围为：[1,∞]
eta：为了防止过拟合，更新过程中用到的收缩步长。在每次提升计算之后，算法会直接获得新特征的权重。
eta通过缩减特征的权重使提升计算过程更加保守。缺省值为0.3，取值范围为：[0,1]
silent：取0时表示打印出运行时信息，取1时表示以缄默方式运行，不打印运行时信息。缺省值为0
objective：定义学习任务及相应的学习目标，“binary:logistic” 表示二分类的逻辑回归问题，输出为概率。

其他参数取默认值。
四、训练模型

# 设置boosting迭代计算次数
num_round = 2

bst =XGBClassifier(max_depth=2, learning_rate=1, n_estimators=num_round,
silent=True, objective='binary:logistic') #sklearn api

bst.fit(X_train, y_train)
XGBoost预测的输出是概率。这里蘑菇分类是一个二类分类问题，输出值是样本为第一类的概率。
我们需要将概率值转换为0或1。

train_preds = bst.predict(X_train)
train_predictions = [round(value) for value in train_preds]

train_accuracy = accuracy_score(y_train, train_predictions)
print ("Train Accuary: %.2f%%" % (train_accuracy * 100.0))
五、测试

模型训练好后，可以用训练好的模型对测试数据进行预测
XGBoost预测的输出是概率，输出值是样本为第一类的概率。我们需要将概率值转换为0或1。

# make prediction
preds = bst.predict(X_test)
predictions = [round(value) for value in preds]

test_accuracy = accuracy_score(y_test, predictions)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
六、代码整理

# coding:utf-8
# 运行 xgboost安装包中的示例程序
from xgboost import XGBClassifier

# 加载LibSVM格式数据模块
from sklearn.datasets import load_svmlight_file
from sklearn.metrics import accuracy_score

from matplotlib import pyplot

print(X_train.shape)
print(X_test.shape)

# 设置boosting迭代计算次数
num_round = 2

#bst = XGBClassifier(**params)
#bst = XGBClassifier()
bst =XGBClassifier(max_depth=2, learning_rate=1, n_estimators=num_round,
silent=True, objective='binary:logistic')

bst.fit(X_train, y_train)

train_preds = bst.predict(X_train)
train_predictions = [round(value) for value in train_preds]

train_accuracy = accuracy_score(y_train, train_predictions)
print ("Train Accuary: %.2f%%" % (train_accuracy * 100.0))

# make prediction
preds = bst.predict(X_test)
predictions = [round(value) for value in preds]

test_accuracy = accuracy_score(y_test, predictions)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
---------------------
作者：鹤鹤有明
来源：CSDN
原文：https://blog.csdn.net/u011630575/article/details/79421053
版权声明：本文为博主原创文章，转载请附上博文链接！

XGBoost使用教程（与sklearn一起使用）二的更多相关文章

CG基础教程-陈惟老师十二讲笔记
转自麽洋TinyOcean:http://www.douban.com/people/Tinyocean/notes?start=50&type=note 因为看了陈惟十二讲视频没有课件,边 ...
JSTL标签库的基本教程之核心标签库(二)
JSTL标签库的基本教程之核心标签库(二) 核心标签库标签描述 <c:out> 用于在JSP中显示数据,就像<%= ... > <c:set> 用于保存数据 & ...
Elasticsearch入门教程(六)：Elasticsearch查询(二)
原文:Elasticsearch入门教程(六):Elasticsearch查询(二) 版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明. 本文链接:h ...
XGBoost使用教程（纯xgboost方法）一
一.导入必要的工具包# 导入必要的工具包import xgboost as xgb # 计算分类正确率from sklearn.metrics import accuracy_score二.数据读取X ...
XGBoost使用教程（进阶篇）三
一.Importing all the libraries import pandas as pdimport numpy as npfrom matplotlib import pyplot as ...
Android快乐贪吃蛇游戏实战项目开发教程-03虚拟方向键（二）绘制一个三角形
该系列教程概述与目录:http://www.cnblogs.com/chengyujia/p/5787111.html 一.绘制三角形在上一篇文章中,我们已经新建了虚拟方向键的自定义控件Direct ...
《C#图解教程》读书笔记之二：存储、类型和变量
本篇已收录至<C#图解教程>读书笔记目录贴,点击访问该目录可获取更多内容. 一.类型初窥:掀起你的盖头来 (1)C程序是一组函数和数据类型,C++程序是一组函数和类,而C#程序是一组类型声 ...
xgboost算法教程(两种使用方法)
标签: xgboost 作者:炼己者 ------ 欢迎大家访问我的简书以及我的博客本博客所有内容以学习.研究和分享为主,如需转载,请联系本人,标明作者和出处,并且是非商业用途,谢谢! ------ ...
sklearn常见分类器(二分类模板)
# -*- coding: utf-8 -*- import pandas as pd import matplotlib matplotlib.rcParams['font.sans-serif'] ...

随机推荐

vector的基本操作
vector怎么删除元素? #include<iostream> #include<vector> using namespace std; int main() { vect ...
8.9 NOIP模拟测试15 建设城市（city）+轰炸行动（bomb）+石头剪刀布（rps）
鉴于T3的惨烈程度,我决定先来颓篇题解. T1 建设城市(city) 挡板法+容斥 m个建设队分成n组,每组必须有一个,先不考虑上限,共有 C(m-1,n-1)种方案. 有i个组是超过k个的,容斥掉 ...
[LeetCode] 891. Sum of Subsequence Widths 子序列宽度之和
Given an array of integers A, consider all non-empty subsequences of A. For any sequence S, let the ...
[LeetCode] 592. Fraction Addition and Subtraction 分数加减法
Given a string representing an expression of fraction addition and subtraction, you need to return t ...
第22课 weak_ptr弱引用智能指针
一. weak_ptr的概况 (一)weak_ptr的创建 1. 直接初始化:weak_ptr<T> wp(sp); //其中sp为shared_ptr类型 2. 赋值: wp1 = sp ...
sync 异步编程
using System; using System.Net; using System.Threading; using System.Threading.Tasks; namespace Cons ...
Django文件上传【单个/多个图片上传】
准备工作 python:3.6.8 django:2.2.1 新建django项目确定项目名称.使用的虚拟环境[当然这个也可以后期修改].app的名称创建成功,选择在新的窗口中打开图片上传修改 ...
Laravel框架中Echo的使用过程
今天的这篇文章中给大家分享关于laravel框架中echo的使用,文章的内容是一步一步来的,用了16步走完一个过程,过程很清晰,希望可以帮助到有需要的朋友吧.话不多说,直接看内容.官方文档推荐使用 P ...
Centos7.5 安装Mysql5.7
#yum -y install wget #wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rp ...
Kafka学习笔记1——Kafka的安装和启动
一.准备工作 1. 安装JDK 可以用命令 java -version 查看版本

XGBoost使用教程（与sklearn一起使用）二

XGBoost使用教程（与sklearn一起使用）二的更多相关文章

随机推荐

热门专题