使用KFold进行训练集和验证集的拆分，使用准确率和召回率来挑选合适的阈值(threshold) 1.KFold(进行交叉验证) 2.np.logical_and(两bool数组都是正即为正) 3.np.logical

---恢复内容开始---

1. k_fold = KFold(n_split, shuffle) 构造KFold的索引切割器

k_fold.split(indices) 对索引进行切割。

参数说明：n_split表示切割的份数，假设切割的份数为10，那么有9份是训练集有1份是测试集，shuffle是否进行清洗，indices表示需要进行切割的索引值

import numpy as np

from sklearn.model_selection import KFold

indices = np.arange(20)

k_fold = KFold(n_splits=10, shuffle=False)

train_test_set = k_fold.split(indices)

for (train_set, test_set) in train_test_set:

    print(train_set)

    print(test_set)

2.np.logical_and(pred_issame, test_issame) # 如果pred_issame中的元素和test_issame都是True, 返回的也是True，否者返回的是False

参数说明：pred_issame输入的bool数组，test_issame输入的bool数组

import numpy as np

pred_issame = np.array([True, True, False, False])

actual_issame = np.array([False, True, False, False])

print(np.logical_and(pred_issame, actual_issame))
# [False  True False False]

3. np.logical_not(pred_issame) # 将输入的True转换为False，False转换为Train

参数说明: pred_issame 表示输入的bool数组

import numpy as np

pred_issame = np.array([True, True, False, False])

print(np.logical_not(pred_issame))

# [False False  True  True]

第一步：构造indices的索引值，使用KFold对incides进行train_set和test_set的生成

第二步: 使用np.arange(0, 4, 0.4) 构造threshold的列表，循环threshold列表

第三步:

第一步: 使用np.less(dist, threshold) 来获得预测结果

第二步:

tp = np.logical_and(pred_issame, actual_issame) # 正样本被判定为正样本

fp = np.logical_and(pre_issame, np.logical_not(actual_issame)) # 负样本被判断为正样本

tn = np.logical_and(np.logical_not(pre_issame), np.logical_not(actual_issame)) # 负样本判断为负样本

fn = np.logical_and(np.logical_not(pre_issame), actual_issame) # 正样本被判断为负样本

tpr = 0 if tp + fn == 0 else float(tp) / float(tp + fn) # 召回率

fpr = 0 if fp + tn == 0 else float(tn) / float(fp + tn)

accur = (tp + tn) / (tp+fp+fn+tn)

第四步：使用threshold_max = np.argmax(accur) # 获得准确率最大的索引值，即为thresholds最好的索引值

def calculate_roc(thresh, dist, actual_issame):

    pre_issame = np.less(dist, thresh)

    tp = np.sum(np.logical_and(pre_issame, actual_issame)) # 正样本被预测为正样本

    fp = np.sum(np.logical_and(pre_issame, np.logical_not(actual_issame))) # 负样本被预测为正样本

    tn = np.sum(np.logical_and(np.logical_not(pre_issame), np.logical_not(actual_issame))) # 负样本被预测为负样本

    fn = np.sum(np.logical_and(np.logical_not(pre_issame),  actual_issame)) # 正样本被预测为负样本

    tpr = 0 if tp + tn == 0 else float(tp) / float(tp + fn)

    fpr = 0 if tp + fn == 0 else float(tn) / float(fp + tn)

    accur = ((tp + tn) / dist.size)

    return tpr, fpr, accur

#

import numpy as np

from sklearn.model_selection import KFold

distance = np.array([0.1, 0.2, 0.3, 0.25, 0.33, 0.20, 0.18, 0.24])

actual_issame = np.array([True, True, False, False, False, True, True, False])

k_fold = KFold(n_splits=4, shuffle=False)

indices = np.arange(len(distance))

for k_num, (train_set, test_set) in enumerate(k_fold.split(indices)):

    thresholds = np.arange(0, 1, 0.04)

    accuracy = np.zeros(len(thresholds))

    for threshold_index, threshold in enumerate(thresholds):

        _, _, accuracy[threshold_index] = calculate_roc(threshold, distance[train_set], actual_issame[train_set])

    max_threshold = np.argmax(accuracy)

    print(thresholds[max_threshold])

---恢复内容结束---

使用KFold进行训练集和验证集的拆分，使用准确率和召回率来挑选合适的阈值(threshold) 1.KFold(进行交叉验证) 2.np.logical_and(两bool数组都是正即为正) 3.np.logical_not(bool数组为正即为反，为反即为正)的更多相关文章

sklearn获得某个参数的不同取值在训练集和测试集上的表现的曲线刻画
from sklearn.svm import SVC from sklearn.datasets import make_classification import numpy as np X,y ...
机器学习入门06 - 训练集和测试集 (Training and Test Sets)
原文链接:https://developers.google.com/machine-learning/crash-course/training-and-test-sets 测试集是用于评估根据训练 ...
sklearn学习3----模型选择和评估（1）训练集和测试集的切分
来自链接:https://blog.csdn.net/zahuopuboss/article/details/54948181 1.sklearn.model_selection.train_test ...
sklearn——train_test_split 随机划分训练集和测试集
sklearn——train_test_split 随机划分训练集和测试集 sklearn.model_selection.train_test_split随机划分训练集和测试集官网文档:http: ...
Sklearn-train_test_split随机划分训练集和测试集
klearn.model_selection.train_test_split随机划分训练集和测试集官网文档:http://scikit-learn.org/stable/modules/gener ...
随机切分csv训练集和测试集
使用numpy切分训练集和测试集觉得有用的话,欢迎一起讨论相互学习~Follow Me 序言在机器学习的任务中,时常需要将一个完整的数据集切分为训练集和测试集.此处我们使用numpy完成这个任务. ...
sklearn中的train_test_split （随机划分训练集和测试集）
官方文档:http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html ...
将dataframe分割为训练集和测试集两部分
data = pd.read_csv("./dataNN.csv",',',error_bad_lines=False)#我的数据集是两列,一列字符串,一列为0,1的labelda ...
用python制作训练集和测试集的图片名列表文本
# -*- coding: utf-8 -*- from pathlib import Path #从pathlib中导入Path import os import fileinput import ...

随机推荐

Gcc 安装过程中部分配置
Gcc 安装过程中部分配置详解全文参考<have fun with Gcc>一文,如有需要请联系原作者prolj@163.com 解压gcc源码后,需要进行configure,这时候一般 ...
poj 1953 World Cup Noise (dp)
World Cup Noise Time Limit: 1000MS Memory Limit: 30000K Total Submissions: 16774 Accepted: 8243 ...
PAT Basic 1010 一元多项式求导 (25 分)
给定一句英语,要求你编写程序,将句中所有单词的顺序颠倒输出. 输入格式: 测试输入包含一个测试用例,在一行内给出总长度不超过 80 的字符串.字符串由若干单词和若干空格组成,其中单词是由英文字母(大小 ...
CentOS下phpMyAdmin安装
1.phpMyAdmin官网下载https://www.phpmyadmin.net/downloads/ 2.下载程序包 wget https://files.phpmyadmin.net/phpM ...
sudo 不用输入密码
3. 设置当前登陆用户免密使用visudo打开sudoers并编辑 sudo visudo 在刚才编辑的内容中加上NOPASSWD: linuxidc ALL=(ALL:ALL) NOPASSWD: ...
IView 给Submenu增加click事件失效解决方案
在浏览器中,打开开发者选项(F12) 找出对应的class,给其添加一个点击事件,就可以了. 具体的 document 操作,看这里 ----> https://www.cnblogs.co ...
orm中的聚合函数，分组，F/Q查询，字段类，事务
目录一.聚合函数 1. 基础语法 2. Max Min Sum Avg Count用法 (1) Max()/Min() (2)Avg() (3)Count() (4)聚合函数联用二.分组查询 1. ...
基于树莓派2代的DIY无线路由器
最近手上多了一个树莓派2代,于是折腾就这么开始了. 因为总是得要个显示屏或者路由器或者插根网线才能玩,有点麻烦,所以有了此文. 设备清单: 树莓派2代 EDUP EP-N8508GS无线网卡(USB) ...
Python可迭代序列反转总结
字符串反转示例:s = "hello" 方法一:使用切片 def reversed_str(s): return s[::-1] 方法二:使用reversed # 字符串 -&g ...
Coding 账户与本地 Git 客户端的配置
1.先创建cooding账户 ,注册地址:https://coding.net/ 2.创建好账户后登陆,在个人设置中验证邮箱和验证手机 (邮箱很重要配置需要用到) 3.安装git 客户端 (在 ...

使用KFold进行训练集和验证集的拆分，使用准确率和召回率来挑选合适的阈值(threshold) 1.KFold(进行交叉验证) 2.np.logical_and(两bool数组都是正即为正) 3.np.logical_not(bool数组为正即为反，为反即为正)

使用KFold进行训练集和验证集的拆分，使用准确率和召回率来挑选合适的阈值(threshold) 1.KFold(进行交叉验证) 2.np.logical_and(两bool数组都是正即为正) 3.np.logical_not(bool数组为正即为反，为反即为正)的更多相关文章

随机推荐

热门专题