xgboost的遗传算法调参

遗传算法适应度的选择：

机器学习的适应度可以是任何性能指标 —准确度，精确度，召回率，F1分数等等。根据适应度值，我们选择表现最佳的父母（“适者生存”），作为幸存的种群。

交配：

存活下来的群体中的父母将通过交配产生后代，使用两个步骤的组合:交叉/重组和突变。

交叉：交配父母的基因(参数)将被重新组合，产生后代，每个孩子从父母双方遗传一些基因(参数)；

突变：一些基因(参数)的值将被改变以保持遗传多样性，这使得遗传算法通常能够得到更好的解决方案。

备注：我们保留幸存的父母，以便保留最好的适应度参数，以防后代的适应度值比父母差。

xgboost超参数搜索遗传算法模块：

模块将具有遵循以下四个步骤的功能：初始化种群，选择，交叉，变异

import numpy as np

import random

from sklearn.metrics import f1_score

import xgboost 

class GeneticXgboost:

    def __init__(self,num_parents=None):

        """

        param num_parents:种群个体的数量

        """

        self.num_parents = num_parents

    def initilialize_poplulation(self):

        """

        初始化种群,即生成规定数量的种群的基因

        learning_rate,n_estimators，max_depth,min_child_weightsubsample,olsample_bytree,gamma

        return：array,shape=[self.num_parents,num_gene]

        """

        learningRate = np.empty([self.num_parents, 1])

        nEstimators  = np.empty([self.num_parents, 1],dtype = np.uint8)

        maxDepth = np.empty([self.num_parents, 1],dtype = np.uint8)

        minChildWeight = np.empty([self.num_parents,1])

        gammaValue = np.empty([self.num_parents,1])

        subSample = np.empty([self.num_parents,1])

        colSampleByTree = np.empty([self.num_parents,1])

        for i in range(self.num_parents):

            #生成每个个体

            learningRate[i]    = round(np.random.uniform(0.01, 1), 2)

            nEstimators[i]     = int(random.randrange(10, 1500, step = 25))

            maxDepth[i]        = int(random.randrange(1, 10, step=1))

            minChildWeight[i]  = round(random.uniform(0.01, 10.0),2)

            gammaValue[i]      = round(random.uniform(0.01, 10.0),2)

            subSample[i]       = round(random.uniform(0.01, 1.0), 2)

            colSampleByTree[i] = round(random.uniform(0.01, 1.0), 2)

            population = np.concatenate((learningRate,nEstimators,maxDepth,minChildWeight,

                                         gammaValue,subSample,colSampleByTree),axis=1)

        return population

    def fitness_function(self,y_true,y_pred):

        """

        定义适应度函数

        """

        fitness = round((f1_score(y_true,y_pred,average='weighted')),4)

        return fitness

    def fitness_compute(self,population,dMatrixTrain,dMatrixtest,y_test):

        """

        计算适应度值

        param population:  种群

        param dMatrixTrain:训练数据，（X,y)

        param dMatrixtest: 测试数据, (x,y)

        param y_test:      测试数据y

        return 种群中每个个体的适应度值

        """

        f1_Score = []

        for i in range(population.shape[0]):#遍历种群中的每一个个体

            param = {'objective':       'binary:logistic',

                     'learning_rate':    population[i][0],

                     'n_estimators':     population[i][1],

                     'max_depth':        int(population[i][2]),

                     'min_child_weight': population[i][3],

                     'gamma':            population[i][4],

                     'subsample':        population[i][5],

                     'colsample_bytree': population[i][6],

                     'seed': 24}

            num_round = 100

            model = xgboost.train(param,dMatrixTrain,num_round)

            preds = model.predict(dMatrixtest)

            preds = preds>0.5

            f1 = self.fitness_function(y_test,preds)

            f1_Score.append(f1)

        return f1_Score

    def parents_selection(self,population,fitness,num_store):

        """

        根据适应度值来选择保留种群中的个体数量

        param population:种群，shape=[self.num_parents,num_gene]

        param num_store: 需要保留的个体数量

        param fitness:   适应度值，array

        return 种群中保留的最好个体，shape=[num_store,num_gene]

        """

        #用于存储需要保留的个体

        selectedParents = np.empty((num_store,population.shape[1]))

        for parentId in range(num_store):

            #找到最大值的索引

            bestFitnessId = np.where(fitness == np.max(fitness))

            bestFitnessId = bestFitnessId[0][0]

            #保存对应的个体基因

            selectedParents[parentId,:] = population[bestFitnessId, :]

            #将提取了值的最大适应度赋值-1，避免再次提取到

            fitness[bestFitnessId] = -1

        return selectedParents

    def crossover_uniform(self,parents,childrenSize):

        """

        交叉

        我们使用均匀交叉，其中孩子的每个参数将基于特定分布从父母中独立地选择

        param parents:

        param childrenSize:

        return

        """

        crossoverPointIndex = np.arange(0,np.uint8(childrenSize[1]),1,dtype= np.uint8)

        crossoverPointIndex1 = np.random.randint(0,np.uint8(childrenSize[1]),

                                                 np.uint8(childrenSize[1]/2))

        crossoverPointIndex2 = np.array(list(set(crossoverPointIndex)-set(crossoverPointIndex1)))

        children = np.empty(childrenSize)

        #将两个父代个体进行交叉

        for i in range(childrenSize[0]):

            #find parent1 index

            parent1_index = i%parents.shape[0]

            #find parent 2 index

            parent2_index = (i+1)%parents.shape[0]

            #insert parameters based on random selected indexes in parent1

            children[i,crossoverPointIndex1] = parents[parent1_index,crossoverPointIndex1]

            #insert parameters based on random selected indexes in parent1

            children[i,crossoverPointIndex2] = parents[parent2_index,crossoverPointIndex2]

        return children

    def mutation(self, crossover, num_param):

        '''

        突变

        随机选择一个参数并通过随机量改变值来引入子代的多样性

        param crossover:要进行突变的种群

        param num_param:参数的个数

        return

        '''

        #定义每个参数允许的最小值和最大值

        minMaxValue = np.zeros((num_param,2))

        minMaxValue[0,:] = [0.01, 1.0]  #min/max learning rate

        minMaxValue[1,:] = [10, 2000]   #min/max n_estimator

        minMaxValue[2,:] = [1, 15]      #min/max depth

        minMaxValue[3,:] = [0, 10.0]    #min/max child_weight

        minMaxValue[4,:] = [0.01, 10.0] #min/max gamma

        minMaxValue[5,:] = [0.01, 1.0]  #min/maxsubsample

        minMaxValue[6,:] = [0.01, 1.0]  #min/maxcolsample_bytree

        #突变随机改变每个后代中的单个基因

        mutationValue = 0

        parameterSelect = np.random.randint(0,7,1)

        if parameterSelect == 0:

            #learning_rate

            mutationValue = round(np.random.uniform(-0.5, 0.5), 2)

        if parameterSelect == 1:

            #n_estimators

            mutationValue = np.random.randint(-200, 200, 1)

        if parameterSelect == 2:

            #max_depth

            mutationValue = np.random.randint(-5, 5, 1)

        if parameterSelect == 3:

            #min_child_weight

            mutationValue = round(np.random.uniform(5, 5), 2)

        if parameterSelect == 4:

            #gamma

            mutationValue = round(np.random.uniform(-2, 2), 2)

        if parameterSelect == 5:

            #subsample

            mutationValue = round(np.random.uniform(-0.5, 0.5), 2)

        if parameterSelect == 6:

            #colsample

            mutationValue = round(np.random.uniform(-0.5, 0.5), 2)

        #通过更改一个参数来引入变异，如果超出范围则设置为max或min

        for idx in range(crossover.shape[0]):

            crossover[idx, parameterSelect] = crossover[idx,parameterSelect]+mutationValue

            if(crossover[idx,parameterSelect]>minMaxValue[parameterSelect,1]):

                crossover[idx,parameterSelect] = minMaxValue[parameterSelect,1]

            if(crossover[idx,parameterSelect] < minMaxValue[parameterSelect,0]):

                crossover[idx,parameterSelect] = minMaxValue[parameterSelect,0]

        return crossover    

######################参数收缩测试##############################################

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

X,y = load_breast_cancer(return_X_y=True)

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=1)

ss = StandardScaler()

X_train = ss.fit_transform(X_train)

X_test  = ss.transform(X_test)

xgDMatrixTrain = xgboost.DMatrix(X_train,y_train)

xgbDMatrixTest = xgboost.DMatrix(X_test, y_test)

number_of_parents = 8     #初始种群数量

number_of_generations = 4 #种群繁殖代数，即迭代次数

number_of_parameters = 7  #将被优化的参数数量

number_of_parents_mating = 4  #每代被保留的个体数量

gx = GeneticXgboost(num_parents=number_of_parents)

#定义种群的大小

populationSize = (number_of_parents,number_of_parameters)

#初始种群

population = gx.initilialize_poplulation()

#定义一个数组来存储fitness历史

FitnessHistory = np.empty([number_of_generations+1, number_of_parents])

#定义一个数组来存储每个父节点和生成的每个参数的值

populationHistory = np.empty([(number_of_generations+1)*number_of_parents,

                               number_of_parameters])

#历史记录中插入初始参数的值

populationHistory[0:number_of_parents,:] = population

#训练

for generation in range(number_of_generations):

    print("This is number %s generation" %(generation))

    #train the dataset and obtain fitness

    FitnessValue = gx.fitness_compute(population=population,

                                      dMatrixTrain=xgDMatrixTrain,

                                      dMatrixtest=xgbDMatrixTest,

                                      y_test=y_test)

    FitnessHistory[generation,:] = FitnessValue

    print('Best F1 score in the iteration = {}'.format(np.max(FitnessHistory[generation,:])))

    #保留的父代

    parents = gx.parents_selection(population=population,

                                                   fitness=FitnessValue,

                                                   num_store=number_of_parents_mating)

    #生成的子代

    children = gx.crossover_uniform(parents=parents,

                     childrenSize=(populationSize[0]-parents.shape[0],number_of_parameters))

    #增加突变以创造遗传多样性

    children_mutated = gx.mutation(children, number_of_parameters)

    #创建新的种群，其中将包含以前根据fitness value选择的父代，和生成的子代

    population[0:parents.shape[0], :] = parents

    population[parents.shape[0]:,  :] = children_mutated

    populationHistory[(generation+1)*number_of_parents:(generation+1)*number_of_parents+number_of_parents,:]=population

#最终迭代的最佳解决方案

fitness = gx.fitness_compute(population=population,

                             dMatrixTrain=xgDMatrixTrain,

                             dMatrixtest=xgbDMatrixTest,

                             y_test=y_test)

bestFitnessIndex = np.where(fitness == np.max(fitness))[0][0]

print("Best fitness is =", fitness[bestFitnessIndex])

print("Best parameters are:")

print('learning_rate=',        population[bestFitnessIndex][0])

print('n_estimators=',         population[bestFitnessIndex][1])

print('max_depth=',            int(population[bestFitnessIndex][2]))

print('min_child_weight=',     population[bestFitnessIndex][3])

print('gamma=',                population[bestFitnessIndex][4])

print('subsample=',            population[bestFitnessIndex][5])

print('colsample_bytree=',     population[bestFitnessIndex][6])

转载：https://www.toutiao.com/i6602143792273293837/

xgboost的遗传算法调参的更多相关文章

XGBoost 重要参数(调参使用)
XGBoost 重要参数(调参使用) 数据比赛Kaggle,天池中最常见的就是XGBoost和LightGBM. 模型是在数据比赛中尤为重要的,但是实际上,在比赛的过程中,大部分朋友在模型上花的时间却 ...
XGBOOST应用及调参示例
该示例所用的数据可从该链接下载,提取码为3y90,数据说明可参考该网页.该示例的“模型调参”这一部分引用了这篇博客的步骤. 数据前处理导入数据 import pandas as pd import ...
xgboost/gbdt在调参时为什么树的深度很少就能达到很高的精度？
问题: 用xgboost/gbdt在在调参的时候把树的最大深度调成6就有很高的精度了.但是用DecisionTree/RandomForest的时候需要把树的深度调到15或更高.用RandomFore ...
【Python机器学习实战】决策树与集成学习（七）——集成学习（5）XGBoost实例及调参
上一节对XGBoost算法的原理和过程进行了描述,XGBoost在算法优化方面主要在原损失函数中加入了正则项,同时将损失函数的二阶泰勒展开近似展开代替残差(事实上在GBDT中叶子结点的最优值求解也是使 ...
xgboost参数及调参
常规参数General Parameters booster[default=gbtree]:选择基分类器,可以是:gbtree,gblinear或者dart.gbtree和draf基于树模型,而gb ...
xgboost使用调参
欢迎关注博主主页,学习python视频资源 https://blog.csdn.net/q383700092/article/details/53763328 调参后结果非常理想 from sklea ...
Xgboost调参总结
一.参数速查参数分为三类: 通用参数:宏观函数控制. Booster参数:控制每一步的booster(tree/regression). 学习目标参数:控制训练目标的表现. 二.回归 from xg ...
xgboost的sklearn接口和原生接口参数详细说明及调参指点
from xgboost import XGBClassifier XGBClassifier(max_depth=3,learning_rate=0.1,n_estimators=100,silen ...
xgboost入门与实战（实战调参篇）
https://blog.csdn.net/sb19931201/article/details/52577592 xgboost入门与实战(实战调参篇) 前言前面几篇博文都在学习原理知识,是时候上 ...

随机推荐

解决Ubuntu下在firefox中打开Microsoft Outlook Web Access中文乱码
Edit---Preference--Content--Languages--Choose...---Select a langue to add... 添加中文
CentOS 6.5使用yum快速搭建LAMP环境
由于这里采用yum方式安装,前提是我们必须配置好yum源.为了加快下载速度,建议使用网易的yum源. 这种方式对于初学者来说,非常方便,但是可定制性不强,而且软件版本较低.一般用于实验和学习环境. 1 ...
Vue 2.0 学习路线
「 Vue很难学吗」对于我这种从0.x版本就开始体验 vuejs 的人来说,当然不算难,那时候没各种脚手架和复杂搭配,仅仅是一个mvvm的解决方案库而已,解决了jq带来的繁琐操作dom痛点,所以就 ...
check camera and driver
1. How to check $ ls /dev/video* /dev/video0 /dev/video1 /dev/video2 /dev/video3 if not, U should ch ...
Laravel学习之旅（三）
视图一.怎么新建视图: 1.视图默认存放路径:resources/views: 2.laravel模板支持原生的PHP,直接可以在resources/views新建一个PHP文件,例如: index ...
ElasticSearch（六）：IK分词器的安装与使用IK分词器创建索引
之前我们创建索引,查询数据,都是使用的默认的分词器,分词效果不太理想,会把text的字段分成一个一个汉字,然后搜索的时候也会把搜索的句子进行分词,所以这里就需要更加智能的分词器IK分词器了. 1. i ...
HDU 4640 状态压缩DP 未写完
原题链接:http://acm.hdu.edu.cn/showproblem.php?pid=4640 解题思路: 首先用一个简单的2^n*n的dp可以求出一个人访问一个给定状态的最小花费,因为这i个 ...
JQuery实时监控文本框字符变化
$(function(){ $('input[name="addr"]').on('input propertychange', function() { if ($('input ...
【vue】创建一个vue前端项目，编译，发布
npm: Nodejs下的包管理器. webpack: 它主要的用途是通过CommonJS的语法把所有浏览器端需要发布的静态资源做相应的准备,比如资源的合并和打包. vue-cli: 用户生成Vue工 ...
stenciljs 学习八组件测试
测试对于框架来说比较重要,对于web 组件的测试同样很重要,类似的jest 很方便,stenciljs也是基于jest 开发的包含两个核心api render(), flush() 测试配置 pac ...

xgboost的遗传算法调参

xgboost的遗传算法调参的更多相关文章

随机推荐

热门专题