GridsearchCV调参
在利用gridseachcv进行调参时,其中关于scoring可以填的参数在SKlearn中没有写清楚,就自己找了下,具体如下:
parameters = {'eps':[0.3,0.4,0.5,0.6], 'min_samples':[20,30,40]}
db = DBSCAN(metric='cosine', algorithm='brute').fit(xx)
grid = GridSearchCV(db, parameters, cv=5, scoring='adjusted_rand_score')
| Scoring | Function | Comment |
|---|---|---|
| Classification | ||
| ‘accuracy’ | metrics.accuracy_score |
|
| ‘average_precision’ | metrics.average_precision_score |
|
| ‘f1’ | metrics.f1_score |
for binary targets |
| ‘f1_micro’ | metrics.f1_score |
micro-averaged |
| ‘f1_macro’ | metrics.f1_score |
macro-averaged |
| ‘f1_weighted’ | metrics.f1_score |
weighted average |
| ‘f1_samples’ | metrics.f1_score |
by multilabel sample |
| ‘neg_log_loss’ | metrics.log_loss |
requires predict_proba support |
| ‘precision’ etc. | metrics.precision_score |
suffixes apply as with ‘f1’ |
| ‘recall’ etc. | metrics.recall_score |
suffixes apply as with ‘f1’ |
| ‘roc_auc’ | metrics.roc_auc_score |
|
| Clustering | ||
| ‘adjusted_rand_score’ | metrics.adjusted_rand_score |
|
| Regression | ||
| ‘neg_mean_absolute_error’ | metrics.mean_absolute_error |
|
| ‘neg_mean_squared_error’ | metrics.mean_squared_error |
|
| ‘neg_median_absolute_error’ | metrics.median_absolute_error |
|
| ‘r2’ | metrics.r2_score |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
但后面听另外一个课的时候老师说,对于特征较多的模型不建议用gridSearch ,耗时,而且只是在train上表现好的参数,不一定在跨时间验证集上表现好
建议设计调参 ,设计的目标是跨时间验证集的KS要最大化,同时跨时间验证集和训练集的KS差距最小
调参方法
- offks + 0.8(offks - devks)最大化
import pandas as pd
from sklearn.metrics import roc_auc_score,roc_curve,auc
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
import numpy as np
import random
import math
import lightgbm as lgb
from sklearn.model_selection import train_test_split data = pd.read_csv('Acard.txt') train = data[data.obs_mth != '2018-11-30'].reset_index().copy()
val = data[data.obs_mth == '2018-11-30'].reset_index().copy()
feature_lst = ['person_info','finance_info','credit_info','act_info']
x = train[feature_lst]
y = train['bad_ind'] val_x = val[feature_lst]
val_y = val['bad_ind'] train_x,test_x,train_y,test_y = train_test_split(x,y,random_state=0,test_size=0.2) #改变我们想去调整的参数为value,设置调参区间
min_value = 40
max_value = 60
for value in range(min_value,max_value+1):
best_omd = -1
best_value = -1
best_ks=[]
def lgb_test(train_x,train_y,test_x,test_y):
clf =lgb.LGBMClassifier(boosting_type = 'gbdt',
objective = 'binary',
metric = 'auc',
learning_rate = 0.1,
n_estimators = value,
max_depth = 5,
num_leaves = 20,
max_bin = 45,
min_data_in_leaf = 6,
bagging_fraction = 0.6,
bagging_freq = 0,
feature_fraction = 0.8,
silent=True
)
clf.fit(train_x,train_y,eval_set = [(train_x,train_y),(test_x,test_y)],eval_metric = 'auc')
return clf,clf.best_score_['valid_1']['auc'],
lgb_model , lgb_auc = lgb_test(train_x,train_y,test_x,test_y) y_pred = lgb_model.predict_proba(x)[:,1]
fpr_lgb_train,tpr_lgb_train,_ = roc_curve(y,y_pred)
train_ks = abs(fpr_lgb_train - tpr_lgb_train).max() y_pred = lgb_model.predict_proba(val_x)[:,1]
fpr_lgb,tpr_lgb,_ = roc_curve(val_y,y_pred)
val_ks = abs(fpr_lgb - tpr_lgb).max() Omd= val_ks + 0.8*(val_ks - train_ks)
if Omd>best_omd:
best_omd = Omd
best_value = value
best_ks = [train_ks,val_ks]
print('best_value:',best_value)
print('best_ks:',best_ks)
GridsearchCV调参的更多相关文章
- lightgbm调参方法
gridsearchcv: https://www.cnblogs.com/bjwu/p/9307344.html gridsearchcv+lightgbm cv函数调参: https://www. ...
- LightGBM调参笔记
本文链接:https://blog.csdn.net/u012735708/article/details/837497031. 概述在竞赛题中,我们知道XGBoost算法非常热门,是很多的比赛的大杀 ...
- GridSearchCV 与 RandomizedSearchCV 调参
GridSearchCV GridSearchCV的名字其实可以拆分为两部分,GridSearch和CV,即网格搜索和交叉验证. 这两个概念都比较好理解,网格搜索,搜索的是参数,即在指定的参数范 ...
- 机器学习笔记——模型调参利器 GridSearchCV(网格搜索)参数的说明
GridSearchCV,它存在的意义就是自动调参,只要把参数输进去,就能给出最优化的结果和参数.但是这个方法适合于小数据集,一旦数据的量级上去了,很难得出结果.这个时候就是需要动脑筋了.数据量比较大 ...
- GridSearchCV和RandomizedSearchCV调参
1 GridSearchCV实际上可以看做是for循环输入一组参数后再比较哪种情况下最优. 使用GirdSearchCV模板 # Use scikit-learn to grid search the ...
- scikit-learn随机森林调参小结
在Bagging与随机森林算法原理小结中,我们对随机森林(Random Forest, 以下简称RF)的原理做了总结.本文就从实践的角度对RF做一个总结.重点讲述scikit-learn中RF的调参注 ...
- scikit-learn 梯度提升树(GBDT)调参小结
在梯度提升树(GBDT)原理小结中,我们对GBDT的原理做了总结,本文我们就从scikit-learn里GBDT的类库使用方法作一个总结,主要会关注调参中的一些要点. 1. scikit-learn ...
- scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类 (python代码)
scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...
- 调参必备---GridSearch网格搜索
什么是Grid Search 网格搜索? Grid Search:一种调参手段:穷举搜索:在所有候选的参数选择中,通过循环遍历,尝试每一种可能性,表现最好的参数就是最终的结果.其原理就像是在数组里找最 ...
随机推荐
- [ZJOI2018]历史
[ZJOI2018]历史 最大化access轻重链的切换次数 考虑一个点的贡献,即它交换重儿子的次数 发现这个次数只和它自己ai以及每个儿子的子树次数和有关. 一个关键的事实是: 我们可以自上而下进行 ...
- Unity 光照着色器
光照着色器需要考虑光照的分类,一般分为漫反射和镜面反射. 漫反射计算基本光照: float brightness=dot(normal,lightDir) 将法线和光的入射方向进行点积运算,求出 ...
- DK location not found. Define location with sdk.dir in the local.properties file or with an ANDROID_HOME
根据提示,我们可以新建一个项目或者以前自己使用过没问题的工程,从中把local.properties文件copy到我们从github中想要导入的工程中,我自己就是这样的,然后这个问题就解决了. ndk ...
- centos 7 安装appache 服务器
一.安装Apache程序,一般有三种安装方式:1.直接网络安装:2.下载rpm包,上传至服务器进行安装:3.通过原代码编译安装: yum -y install httpd rpm -qa | grep ...
- Luogu P4768 [NOI2018]归程
题目链接 \(Click\) \(Here\) \(Kruskal\)重构树的好题.想到的话就很好写,想不到乱搞的难度反而相当高. 按照点的水位,建出来满足小根队性质的\(Kruskal\)重构树,这 ...
- Qt ------ QFileDialog
QString strFile = QFileDialog::getOpenFileName(this,QStringLiteral("选择Excel文件"),"&quo ...
- zookeeper 介绍与集群安装
zookeeper 介绍 ZooKeeper是一个分布式开源框架,提供了协调分布式应用的基本服务,它向外部应用暴露一组通用服务——分布式同步(Distributed Synchronization). ...
- TRAFFIC ANALYSIS EXERCISE - Ransomer
catalogue . SCENARIO . QUESTIONS . Analysis:10.3.14.134 . Analysis:10.3.14.131 1. SCENARIO The pcap ...
- SpringBoot笔记十六:ElasticSearch
目录 ElasticSearch官方文档 ElasticSearch安装 ElasticSearch简介 ElasticSearch操作数据,RESTful风格 存储 检查是否存在 删除 查询 更新 ...
- mysql批量替换数据
如题,项目域名迁移,导致原来商城的商品图片无法查看,地址错误. 怎么办?修改数据库图片路径呗!什么几千行呐,开玩笑.这个任务没人接,只有我干咯! 怎么也得不少时间吧,好吧半天,这是上面的要求. 有聪明 ...