python调参神器hyperopt

一、安装

pip install hyperopt

二、说明

Hyperopt提供了一个优化接口，这个接口接受一个评估函数和参数空间，能计算出参数空间内的一个点的损失函数值。用户还要指定空间内参数的分布情况。
Hyheropt四个重要的因素：指定需要最小化的函数，搜索的空间，采样的数据集(trails database)（可选），搜索的算法（可选）。
首先，定义一个目标函数,接受一个变量,计算后返回一个函数的损失值，比如要最小化函数q(x,y) = x**2 + y**2

指定搜索的算法，算法也就是hyperopt的fmin函数的algo参数的取值。当前支持的算法由随机搜索(对应是hyperopt.rand.suggest)，模拟退火(对应是hyperopt.anneal.suggest)，TPE算法。

关于参数空间的设置，比如优化函数q，输入fmin(q,space=hp.uniform(‘a’,0,1)).hp.uniform函数的第一个参数是标签，每个超参数在参数空间内必须具有独一无二的标签。hp.uniform指定了参数的分布。其他的参数分布比如

hp.choice返回一个选项，选项可以是list或者tuple.options可以是嵌套的表达式，用于组成条件参数。
hp.pchoice(label,p_options)以一定的概率返回一个p_options的一个选项。这个选项使得函数在搜索过程中对每个选项的可能性不均匀。
hp.uniform(label,low,high)参数在low和high之间均匀分布。
hp.quniform(label,low,high,q),参数的取值是round(uniform(low,high)/q)*q，适用于那些离散的取值。
hp.loguniform(label,low,high)绘制exp(uniform(low,high)),变量的取值范围是[exp(low),exp(high)]
hp.randint(label,upper) 返回一个在[0,upper)前闭后开的区间内的随机整数。

搜索空间可以含有list和dictionary.

from hyperopt import hp

list_space = [

hp.uniform(’a’, 0, 1),

hp.loguniform(’b’, 0, 1)]

tuple_space = (

hp.uniform(’a’, 0, 1),

hp.loguniform(’b’, 0, 1))

dict_space = {

’a’: hp.uniform(’a’, 0, 1),

’b’: hp.loguniform(’b’, 0, 1)}

三、简单例子

from hyperopt import  hp,fmin, rand, tpe, space_eval

def q (args) :

    x, y = args

    return x**2-2*x+1 + y**2

space = [hp.randint('x', 5), hp.randint('y', 5)]

best = fmin(q,space,algo=rand.suggest,max_evals=10)

print(best)

输出：

{'x': 2, 'y': 0}

四、xgboost举例

xgboost具有很多的参数，把xgboost的代码写成一个函数，然后传入fmin中进行参数优化，将交叉验证的auc作为优化目标。auc越大越好，由于fmin是求最小值，因此求-auc的最小值。所用的数据集是202列的数据集，第一列样本id，最后一列是label,中间200列是属性。

#coding:utf-8

import numpy as np

import pandas as pd

from sklearn.preprocessing import MinMaxScaler

import xgboost as xgb

from random import shuffle

from xgboost.sklearn import XGBClassifier

from sklearn.cross_validation import cross_val_score

import pickle

import time

from hyperopt import fmin, tpe, hp,space_eval,rand,Trials,partial,STATUS_OK

def loadFile(fileName = "E://zalei//browsetop200Pca.csv"):

    data = pd.read_csv(fileName,header=None)

    data = data.values

    return data

data = loadFile()

label = data[:,-1]

attrs = data[:,:-1]

labels = label.reshape((1,-1))

label = labels.tolist()[0]

minmaxscaler = MinMaxScaler()

attrs = minmaxscaler.fit_transform(attrs)

index = range(0,len(label))

shuffle(index)

trainIndex = index[:int(len(label)*0.7)]

print len(trainIndex)

testIndex = index[int(len(label)*0.7):]

print len(testIndex)

attr_train = attrs[trainIndex,:]

print attr_train.shape

attr_test = attrs[testIndex,:]

print attr_test.shape

label_train = labels[:,trainIndex].tolist()[0]

print len(label_train)

label_test = labels[:,testIndex].tolist()[0]

print len(label_test)

print np.mat(label_train).reshape((-1,1)).shape

def GBM(argsDict):

    max_depth = argsDict["max_depth"] + 5

    n_estimators = argsDict['n_estimators'] * 5 + 50

    learning_rate = argsDict["learning_rate"] * 0.02 + 0.05

    subsample = argsDict["subsample"] * 0.1 + 0.7

    min_child_weight = argsDict["min_child_weight"]+1

    print "max_depth:" + str(max_depth)

    print "n_estimator:" + str(n_estimators)

    print "learning_rate:" + str(learning_rate)

    print "subsample:" + str(subsample)

    print "min_child_weight:" + str(min_child_weight)

    global attr_train,label_train

    gbm = xgb.XGBClassifier(nthread=4,    #进程数

                            max_depth=max_depth,  #最大深度

                            n_estimators=n_estimators,   #树的数量

                            learning_rate=learning_rate, #学习率

                            subsample=subsample,      #采样数

                            min_child_weight=min_child_weight,   #孩子数

                            max_delta_step = 10,  #10步不降则停止

                            objective="binary:logistic")

    metric = cross_val_score(gbm,attr_train,label_train,cv=5,scoring="roc_auc").mean()

    print metric

    return -metric

space = {"max_depth":hp.randint("max_depth",15),

         "n_estimators":hp.randint("n_estimators",10),  #[0,1,2,3,4,5] -> [50,]

         "learning_rate":hp.randint("learning_rate",6),  #[0,1,2,3,4,5] -> 0.05,0.06

         "subsample":hp.randint("subsample",4),#[0,1,2,3] -> [0.7,0.8,0.9,1.0]

         "min_child_weight":hp.randint("min_child_weight",5), #

        }

algo = partial(tpe.suggest,n_startup_jobs=1)

best = fmin(GBM,space,algo=algo,max_evals=4)#max_evals表示想要训练的最大模型数量，越大越容易找到最优解

print best

print GBM(best)

详细参考：http://blog.csdn.net/qq_34139222/article/details/60322995

python调参神器hyperopt的更多相关文章

自动调参库hyperopt+lightgbm 调参demo
在此之前,调参要么网格调参,要么随机调参,要么肉眼调参.虽然调参到一定程度,进步有限,但仍然很耗精力. 自动调参库hyperopt可用tpe算法自动调参,实测强于随机调参. hyperopt 需要自己 ...
Xgboost调参总结
一.参数速查参数分为三类: 通用参数:宏观函数控制. Booster参数:控制每一步的booster(tree/regression). 学习目标参数:控制训练目标的表现. 二.回归 from xg ...
hyperopt自动调参
hyperopt自动调参在传统机器学习和深度学习领域经常需要调参,调参有些是通过通过对数据和算法的理解进行的,这当然是上上策,但还有相当一部分属于"黑盒" hyperopt可以帮 ...
python 机器学习中模型评估和调参
在做数据处理时,需要用到不同的手法,如特征标准化,主成分分析,等等会重复用到某些参数,sklearn中提供了管道,可以一次性的解决该问题先展示先通常的做法 import pandas as pd f ...
Python中Gradient Boosting Machine(GBM）调参方法详解
原文地址:Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python by Aarshay Jain 原文翻译与校对 ...
scikit learn 模块调参 pipeline+girdsearch 数据举例：文档分类（python代码）
scikit learn 模块调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...
【Python机器学习实战】决策树与集成学习（七）——集成学习（5）XGBoost实例及调参
上一节对XGBoost算法的原理和过程进行了描述,XGBoost在算法优化方面主要在原损失函数中加入了正则项,同时将损失函数的二阶泰勒展开近似展开代替残差(事实上在GBDT中叶子结点的最优值求解也是使 ...
python的随机森林模型调参
一.一般的模型调参原则 1.调参前提:模型调参其实是没有定论,需要根据不同的数据集和不同的模型去调.但是有一些调参的思想是有规律可循的,首先我们可以知道,模型不准确只有两种情况:一是过拟合,而是欠拟合 ...
CatBoost算法和调参
欢迎关注博主主页,学习python视频资源 sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?co ...

随机推荐

jQuery的动画与特效
显示与隐藏 show() 和 hide() 方法动画效果的show() 和 hide() show(speed,[]callback) hide(speed,[]callback) speed:表示 ...
Activiti5工作流笔记四
排他网关(ExclusiveGateWay) 流程图部署流程定义+启动流程实例查询我的个人任务完成我的个人任务并行网关(parallelGateWay) 流程图部署流程定义+启动流程实例查 ...
BZOJ 口胡记录
最近实在是懒的不想打代码...好像口胡也算一种训练,那就口胡把. BZOJ 2243 染色(树链剖分) 首先树链剖分,然后记录下每个区间的左右端点颜色和当前区间的颜色段.再对每个节点维护一个tag标记 ...
cogs1667[SGU422]傻叉小明打字
其实和CF498bName that Tune差不多题意: 现在需要依次输入n个字符,第i个字符输入的时候有pi的概率输错,不论是第几次输入(0<=pi<=0.5).每输入一个字符的用时 ...
P3986 斐波那契数列
题目描述定义一个数列: f(0)=a,f(1)=b,f(n)=f(n−1)+f(n−2) 其中 a,b均为正整数,n≥2 . 问有多少种 (a,b),使得 k 出现在这个数列里,且不是前两项. 由于 ...
[洛谷P5169]xtq的异或和
题目大意:给你一张$n(n\leqslant10^5)$个点$m(m\leqslant3\times10^5)$条边的无向图,每条边有一个权值,$q(q\leqslant2^{18})$次询问,每次询 ...
【Codeforces 506E】Mr.Kitayuta’s Gift&&【BZOJ 4214】黄昏下的礼物 dp转有限状态自动机+矩阵乘法优化
神题……胡乱讲述一下思维过程……首先,读懂题.然后,转化问题为构造一个长度为|T|+n的字符串,使其内含有T这个子序列.之后,想到一个简单的dp.由于是回文串,我们就增量构造半个回文串,设f(i,j, ...
[zhuan]动态链接库中的.symtab和.dynsym
http://blog.csdn.net/beyond702/article/details/50979340 原文如下: shared library (.so) "Program Lib ...
mysql 集群+主从同步
SQL节点: 给上层应用层提供sql访问. 管理节点(MGM): 管理整个集群. 启动,关闭集群. 通过ndb_mgmd命令启动集群存储/数据节点: 保存cluster中的数据. 数据节点,可以 ...
Codeforces Round #340 (Div. 2) D
D. Polyline time limit per test 1 second memory limit per test 256 megabytes input standard input ou ...

python调参神器hyperopt

python调参神器hyperopt的更多相关文章

随机推荐

热门专题