Micro Average vs Macro average Performance in a Multiclass classification setting
Micro- and macro-averages (for whatever metric) will compute slightly different things, and thus their interpretation differs. A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally), whereas a micro-average will aggregate the contributions of all classes to compute the average metric. In a multi-class classification setup, micro-average is preferable if you suspect there might be class imbalance (i.e you may have many more examples of one class than of other classes).
To illustrate why, take for example precision Pr=TP / (TP+FP). Let's imagine you have a One-vs-All(there is only one correct class output per example) multi-class classification system with four classes and the following numbers when tested:
- Class A: 1 TP and 1 FP
- Class B: 10 TP and 90 FP
- Class C: 1 TP and 1 FP
- Class D: 1 TP and 1 FP
You can see easily that PrA=PrC=PrD=0.5 , whereas PrB=0.1.
- A macro-average will then compute: Pr=0.5+0.1+0.5+0.54=0.4
- A micro-average will compute: Pr=1+10+1+12+100+2+2=0.123
宏查准率:这些类别中是否有尽可能多的类别的查准率尽可能高。-- 侧重各个类别是否预测准确
微查准率:这多组实验中,预测准确的数据占总的预测数据的比例。-- 侧重预测准确的数据的比例
These are quite different values for precision. Intuitively, in the macro-average the "good" precision (0.5) of classes A, C and D is contributing to maintain a "decent" overall precision (0.4). While this is technically true (across classes, the average precision is 0.4), it is a bit misleading, since a large number of examples are not properly classified. These examples predominantly correspond to class B, so they only contribute 1/4 towards the average in spite of constituting 94.3% of your test data. The micro-average will adequately capture this class imbalance, and bring the overall precision average down to 0.123 (more in line with the precision of the dominating class B (0.1)).
当class-imblance已知,但仍要采用macro-average时,需要采取的措施:
1. 报告macro-average + standard deviation(标准差) (对于>=3的多分类任务)
2. 加权macro-average (考虑样本数的影响)
For computational reasons, it may sometimes be more convenient to compute class averages and then macro-average them. If class imbalance is known to be an issue, there are several ways around it. One is to report not only the macro-average, but also its standard deviation (for 3 or more classes). Another is to compute a weighted macro-average, in which each class contribution to the average is weighted by the relative number of examples available for it. In the above scenario, we obtain:
1. Prmacro−mean=0.25·0.5+0.25·0.1+0.25·0.5+0.25·0.5=0.4
Prmacro−stdev=0.173
2. Prmacro−weighted= 2/106 * 0.5 + 100 / 106 * 0.1 + 2 / 106 * 0.5 + 2 / 106 * 0.5
= 0.0189·0.5+0.943·0.1+0.0189·0.5+0.0189·0.5=0.009+0.094+0.009+0.009=0.123
The large standard deviation (0.173) already tells us that the 0.4 average does not stem from a uniform precision among classes, but it might be just easier to compute the weighted macro-average, which in essence is another way of computing the micro-average.
Micro Average vs Macro average Performance in a Multiclass classification setting的更多相关文章
- 机器学习--Micro Average,Macro Average, Weighted Average
根据前面几篇文章我们可以知道,当我们为模型泛化性能选择评估指标时,要根据问题本身以及数据集等因素来做选择.本篇博客主要是解释Micro Average,Macro Average,Weighted A ...
- Spark2.0机器学习系列之5:随机森林
概述 随机森林是决策树的组合算法,基础是决策树,关于决策树和Spark2.0中的代码设计可以参考本人另外一篇博客: http://www.cnblogs.com/itboys/p/8312894.ht ...
- Spark2.0机器学习系列之3:决策树
概述 分类决策树模型是一种描述对实例进行分类的树形结构. 决策树可以看为一个if-then规则集合,具有“互斥完备”性质 .决策树基本上都是 采用的是贪心(即非回溯)的算法,自顶向下递归分治构造. 生 ...
- Micro和Macro性能学习【转载】
转自:https://datascience.stackexchange.com/questions/15989/micro-average-vs-macro-average-performance- ...
- Maximum Average Subarray
Given an array with positive and negative numbers, find the maximum average subarray which length sh ...
- 性能分析_linux服务器CPU_Load Average
CPU度量Load Average 1. 概念介绍 1.1 Linux系统进程状态 在linux中,process有以下状态: runnable (就绪状态):blocked waiting fo ...
- LINQ 学习路程 -- 查询操作 Average Count Max Sum
IList<, , }; var avg = intList.Average(); Console.WriteLine("Average: {0}", avg); IList ...
- F1 score,micro F1score,macro F1score 的定义
F1 score,micro F1score,macro F1score 的定义 2018年09月28日 19:30:08 wanglei_1996 阅读数 976 本篇博客可能会继续更新 最近在 ...
- [LeetCode] 805. Split Array With Same Average 用相同均值拆分数组
In a given integer array A, we must move every element of A to either list B or list C. (B and C ini ...
随机推荐
- 学大伟业 Day 2 培训总结
一.dp 动态规划的本质 是一种思想.通过对原问题划分成子问题,寻找子问题之间的联系,通过求解子问题得出原问题的解.与贪心不同的是,动归是深谋远虑,考虑全局最优解:而贪心则目光短浅,只考虑局部最优解. ...
- js函数在frame中的相互的调用
框架编程概述一个HTML页面可以有一个或多个子框架,这些子框架以<iframe>来标记,用来显示一个独立的HTML页面.这里所讲的框架编程包括框架的自我控制以及框架之间的互相访问,例如从一 ...
- Validform 基于表单验证
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding= ...
- JS模块化知识总结
背景 <script src="a.js"></script> <script src="b.js"></script ...
- CF1042A 【Benches】(优先队列)
这是一道良心的cf题 题意里让你求的是来了m个人后人数最多的长椅上最少和最多有多少人 如果要求最多,很好办,m个人都挤到原来人数最多的长椅上了(一眼看出) 但如果要求最少呢? 大家看图 长椅某个时间的 ...
- C.Sum 2017 ACM-ICPC 亚洲区(西安赛区)网络赛
题目来源:Sum 限制:1000ms 32768K Define the function S(x) for xx is a positive integer. S(x) equals to the ...
- ABAP术语-Purchase Order
Purchase Order 原文:http://www.cnblogs.com/qiangsheng/archive/2008/03/07/1094717.html Request or instr ...
- Linux系统VPS主机SSH常用命令
putty查询log文当里的"test"关键字 /home/iotserver/WebServer3_log# grep "test" log.log.bak2 ...
- Spring Boot2.4双数据源的配置
相较于单数据源,双数据源配置有时候在数据分库的时候可能更加有利 但是在参考诸多博客以及书籍(汪云飞的实战书)的时候,发现对于spring boot1.X是完全没问题的,一旦切换到spring boot ...
- CentOS7版本基础使用
第1章 CentOS7的使用 1.1 为什么要使用CentOS7版本 CentOS7是在CentOS6基础上发布的新版本,与之前的版本相比,主要的更新包括: 1.内核更新到3.10.0 2.支持Lin ...