一致性检验评价方法kappa
最近在做眼底图像的无监督分类,使用的数据集辣子kaggle的Diabetic Retinopathy,简称DR,中文称糖尿病型眼底疾病。
最后的评估方法是二次加权kappa。以前没接触过,网上也没有具体的介绍,在这里简单谈谈我的理解,如有错误欢迎指出。
简介
Kappa指数用来衡量两个模型对同一张图片进行判断时,判断结果一致的程度,结果范围从0~1,1表示评价完全相同,0表示评价完全相反。
一般用模型获得相同评价的数量与基于可能性的期望是否有差别来分析,当两个模型相同评价的数量和基于可能性期望的数量基本一样时,kappa的值就接近于1。
举个栗子,模型A和基准的kappa:

kappa = (p0-pe) / (n-pe)
其中,P0 = 对角线单元中观测值的总和;pe = 对角线单元中期望值的总和。
根据kappa的计算方法分为简单kappa(simple kappa)和加权kappa(weighted kappa),加权kappa又分为linear weighted kappa和quadratic weighted kappa。
weighted kappa
关于linear还是quadratic weighted kappa的选择,取决于你的数据集中不同class之间差异的意义。比如对于眼底图像识别的数据,class=0为健康,class=4为疾病晚期非常严重,所以对于把class=0预测成4的行为所造成的惩罚应该远远大于把class=0预测成class=1的行为,使用quadratic的话0->4所造成的惩罚就等于16倍的0->1的惩罚。如下图是一个四分类的两个计算方法的比较。

Python实现
参考:https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/quadratic_weighted_kappa.py
#! /usr/bin/env python2.7
import numpy as np
def confusion_matrix(rater_a, rater_b, min_rating=None, max_rating=None):
"""
Returns the confusion matrix between rater's ratings
"""
assert(len(rater_a) == len(rater_b))
if min_rating is None:
min_rating = min(rater_a + rater_b)
if max_rating is None:
max_rating = max(rater_a + rater_b)
num_ratings = int(max_rating - min_rating + 1)
conf_mat = [[0 for i in range(num_ratings)]
for j in range(num_ratings)]
for a, b in zip(rater_a, rater_b):
conf_mat[a - min_rating][b - min_rating] += 1
return conf_mat
def histogram(ratings, min_rating=None, max_rating=None):
"""
Returns the counts of each type of rating that a rater made
"""
if min_rating is None:
min_rating = min(ratings)
if max_rating is None:
max_rating = max(ratings)
num_ratings = int(max_rating - min_rating + 1)
hist_ratings = [0 for x in range(num_ratings)]
for r in ratings:
hist_ratings[r - min_rating] += 1
return hist_ratings
def quadratic_weighted_kappa(rater_a, rater_b, min_rating=None, max_rating=None):
"""
Calculates the quadratic weighted kappa
quadratic_weighted_kappa calculates the quadratic weighted kappa
value, which is a measure of inter-rater agreement between two raters
that provide discrete numeric ratings. Potential values range from -1
(representing complete disagreement) to 1 (representing complete
agreement). A kappa value of 0 is expected if all agreement is due to
chance.
quadratic_weighted_kappa(rater_a, rater_b), where rater_a and rater_b
each correspond to a list of integer ratings. These lists must have the
same length.
The ratings should be integers, and it is assumed that they contain
the complete range of possible ratings.
quadratic_weighted_kappa(X, min_rating, max_rating), where min_rating
is the minimum possible rating, and max_rating is the maximum possible
rating
"""
rater_a = np.array(rater_a, dtype=int)
rater_b = np.array(rater_b, dtype=int)
assert(len(rater_a) == len(rater_b))
if min_rating is None:
min_rating = min(min(rater_a), min(rater_b))
if max_rating is None:
max_rating = max(max(rater_a), max(rater_b))
conf_mat = confusion_matrix(rater_a, rater_b,
min_rating, max_rating)
num_ratings = len(conf_mat)
num_scored_items = float(len(rater_a))
hist_rater_a = histogram(rater_a, min_rating, max_rating)
hist_rater_b = histogram(rater_b, min_rating, max_rating)
numerator = 0.0
denominator = 0.0
for i in range(num_ratings):
for j in range(num_ratings):
expected_count = (hist_rater_a[i] * hist_rater_b[j]
/ num_scored_items)
d = pow(i - j, 2.0) / pow(num_ratings - 1, 2.0)
numerator += d * conf_mat[i][j] / num_scored_items
denominator += d * expected_count / num_scored_items
return 1.0 - numerator / denominator
def linear_weighted_kappa(rater_a, rater_b, min_rating=None, max_rating=None):
"""
Calculates the linear weighted kappa
linear_weighted_kappa calculates the linear weighted kappa
value, which is a measure of inter-rater agreement between two raters
that provide discrete numeric ratings. Potential values range from -1
(representing complete disagreement) to 1 (representing complete
agreement). A kappa value of 0 is expected if all agreement is due to
chance.
linear_weighted_kappa(rater_a, rater_b), where rater_a and rater_b
each correspond to a list of integer ratings. These lists must have the
same length.
The ratings should be integers, and it is assumed that they contain
the complete range of possible ratings.
linear_weighted_kappa(X, min_rating, max_rating), where min_rating
is the minimum possible rating, and max_rating is the maximum possible
rating
"""
assert(len(rater_a) == len(rater_b))
if min_rating is None:
min_rating = min(rater_a + rater_b)
if max_rating is None:
max_rating = max(rater_a + rater_b)
conf_mat = confusion_matrix(rater_a, rater_b,
min_rating, max_rating)
num_ratings = len(conf_mat)
num_scored_items = float(len(rater_a))
hist_rater_a = histogram(rater_a, min_rating, max_rating)
hist_rater_b = histogram(rater_b, min_rating, max_rating)
numerator = 0.0
denominator = 0.0
for i in range(num_ratings):
for j in range(num_ratings):
expected_count = (hist_rater_a[i] * hist_rater_b[j]
/ num_scored_items)
d = abs(i - j) / float(num_ratings - 1)
numerator += d * conf_mat[i][j] / num_scored_items
denominator += d * expected_count / num_scored_items
return 1.0 - numerator / denominator
def kappa(rater_a, rater_b, min_rating=None, max_rating=None):
"""
Calculates the kappa
kappa calculates the kappa
value, which is a measure of inter-rater agreement between two raters
that provide discrete numeric ratings. Potential values range from -1
(representing complete disagreement) to 1 (representing complete
agreement). A kappa value of 0 is expected if all agreement is due to
chance.
kappa(rater_a, rater_b), where rater_a and rater_b
each correspond to a list of integer ratings. These lists must have the
same length.
The ratings should be integers, and it is assumed that they contain
the complete range of possible ratings.
kappa(X, min_rating, max_rating), where min_rating
is the minimum possible rating, and max_rating is the maximum possible
rating
"""
assert(len(rater_a) == len(rater_b))
if min_rating is None:
min_rating = min(rater_a + rater_b)
if max_rating is None:
max_rating = max(rater_a + rater_b)
conf_mat = confusion_matrix(rater_a, rater_b,
min_rating, max_rating)
num_ratings = len(conf_mat)
num_scored_items = float(len(rater_a))
hist_rater_a = histogram(rater_a, min_rating, max_rating)
hist_rater_b = histogram(rater_b, min_rating, max_rating)
numerator = 0.0
denominator = 0.0
for i in range(num_ratings):
for j in range(num_ratings):
expected_count = (hist_rater_a[i] * hist_rater_b[j]
/ num_scored_items)
if i == j:
d = 0.0
else:
d = 1.0
numerator += d * conf_mat[i][j] / num_scored_items
denominator += d * expected_count / num_scored_items
return 1.0 - numerator / denominator
def mean_quadratic_weighted_kappa(kappas, weights=None):
"""
Calculates the mean of the quadratic
weighted kappas after applying Fisher's r-to-z transform, which is
approximately a variance-stabilizing transformation. This
transformation is undefined if one of the kappas is 1.0, so all kappa
values are capped in the range (-0.999, 0.999). The reverse
transformation is then applied before returning the result.
mean_quadratic_weighted_kappa(kappas), where kappas is a vector of
kappa values
mean_quadratic_weighted_kappa(kappas, weights), where weights is a vector
of weights that is the same size as kappas. Weights are applied in the
z-space
"""
kappas = np.array(kappas, dtype=float)
if weights is None:
weights = np.ones(np.shape(kappas))
else:
weights = weights / np.mean(weights)
# ensure that kappas are in the range [-.999, .999]
kappas = np.array([min(x, .999) for x in kappas])
kappas = np.array([max(x, -.999) for x in kappas])
z = 0.5 * np.log((1 + kappas) / (1 - kappas)) * weights
z = np.mean(z)
return (np.exp(2 * z) - 1) / (np.exp(2 * z) + 1)
def weighted_mean_quadratic_weighted_kappa(solution, submission):
predicted_score = submission[submission.columns[-1]].copy()
predicted_score.name = "predicted_score"
if predicted_score.index[0] == 0:
predicted_score = predicted_score[:len(solution)]
predicted_score.index = solution.index
combined = solution.join(predicted_score, how="left")
groups = combined.groupby(by="essay_set")
kappas = [quadratic_weighted_kappa(group[1]["essay_score"], group[1]["predicted_score"]) for group in groups]
weights = [group[1]["essay_weight"].irow(0) for group in groups]
return mean_quadratic_weighted_kappa(kappas, weights=weights)
一致性检验评价方法kappa的更多相关文章
- 多准则决策模型-TOPSIS评价方法-源码
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 ...
- 自动文档摘要评价方法:Edmundson,ROUGE
自动文档摘要评价方法大致分为两类: (1)内部评价方法(Intrinsic Methods):提供参考摘要,以参考摘要为基准评价系统摘要的质量.系统摘要与参考摘要越吻合, 质量越高. (2)外部评价方 ...
- 全参考视频质量评价方法(PSNR,SSIM)以及与MOS转换模型
转载处:http://blog.csdn.NET/leixiaohua1020/article/details/11694369 最常用的全参考视频质量评价方法有以下2种: PSNR(峰值信噪比):用 ...
- 图像质量评价方法PSNR+SSIM&&评估指标SROCC,PLCC
update:2018-04-07 今天发现ssim的计算里面有高斯模糊,为了快速计算,先对每个小块进行计算,然后计算所有块的平均值.可以参考源代码实现,而且代码实现有近似的在里面!matlab中中图 ...
- 机器学习评价方法 - Recall & Precision
刚开始看这方面论文的时候对于各种评价方法特别困惑,还总是记混,不完全统计下,备忘. 关于召回率和精确率,假设二分类问题,正样本为x,负样本为o: 准确率存在的问题是当正负样本数量不均衡的时候: 精心设 ...
- 视频质量评价方法:VQM
如何确定一个视频质量的好坏一直以来都是个棘手的问题.目前常用的方法就是通过人眼来直接观看,但是由于人眼的主观性及观看人员的单体差异性,对于同样的视频质量,不同的人的感受是不一样的.为此多个研究机构提出 ...
- 多标签图像分类任务的评价方法-mAP
http://blog.sina.com.cn/s/blog_9db078090102whzw.html 多标签图像分类(Multi-label Image Classification)任务中图片的 ...
- logistic regression评价方法
1.sensitivity,也叫recall,true positive rate,含义是预测为正向的case中对的(true positive)和所有事实为正向的case的比例. 2.specifi ...
- 【一致性检验指标】Kappa(cappa)系数
1 定义 百度百科的定义: 它是通过把所有地表真实分类中的像元总数(N)乘以混淆矩阵对角线(Xkk)的和,再减去某一类地表真实像元总数与被误分成该类像元总数之积对所有类别求和的结果,再除以总像元数的平 ...
随机推荐
- Solution -「SDOI 2017」「洛谷 P3784」遗忘的集合
\(\mathcal{Description}\) Link. 给定 \(\{f_1,f_2,\cdots,f_n\}\),素数 \(p\).求字典序最小的 \(\{a_1,a_2,\cdot ...
- 简单的springboot + vue
安装vue 脚手架 npm install -g @vue/cli 查看vue 版本 vue -V 创建vue项目 vue create vue_project Vue CLI v4.5.13? Pl ...
- 使用PHP利用phpmailer发送电子邮件
先来几句废话: phpMailer是一个非常强大的php发送邮件类,可以设定发送邮件地址.回复地址.邮件主题.html网页,上传附件,并且使用起来非常方便. phpMailer的特点: ...
- kali主机探测命令与工具集
实验目的 熟悉ping.arping.fping.hping3.nbtscan.nping.p0f.xprobe2工具对目标主机的探测方法. 实验原理 目标识别工具发送特殊构造的数据包,根据返回的应答 ...
- 医院大数据平台建设_构建医院智能BI平台的关键技术
在新技术层出不穷的当下,世界各地的组织正在以闪电般的速度变化和进化,以便在新技术可用时加以利用.其中目前最具活力的一个领域是商业智能(BI).想一想,你可能已经习惯以每周或每月IT或数据科学家交付给你 ...
- Vue框架简介和环境搭建
前言: 此篇随笔为个人学习前端框架Vue,js的技术笔记,主要记录一些自己在学习Vue框架的心得体会和技术总结,作为回顾和笔记使用. 这种写博客的方式,对刚开始学习Vue框架的我,也是一种激励,我相信 ...
- 【C# .Net GC】sos.dll 混合模式调试(托管调试+本机)
当我们想使用本机调试器(如CDB或WinDBG)调试.NET应用程序时,我们必须在本机调试器和托管世界之间使用"桥",因为本机调试器本身并不理解托管代码.它是本机调试器.为了提供这 ...
- Hadoop - HA学习笔记
Hadoop HA概述 工作要点 通过双NameNode消除单点故障 元数据管理方式需要改变:内存中各自保存一份元数据:Edits 日志只有 Active 状态的NameNode节点可以做写操作:两个 ...
- 基于angularJs坐标转换指令(经纬度中的度分秒转化为小数形式 )
最近项目中,需要用户输入经纬度信息,因为数据库设计的时候,不可能分三个字段来存储这种信息,只能用double类型来进行存储. 计算公式 double r=度+分/60+秒/3600 <!DOC ...
- pyqt(四)
八.布局 1. 布局简介 一个pyqt窗口中可以有多个控件 所谓布局,指的就是多个控件在窗口中的展示方式 布局方式大致分为: 水平布局 竖直布局 网格布局 表单布局 2. 水平布局QHBoxLayou ...