I noticed that that 'r2_score' and 'explained_variance_score' are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from 'explained_variance_score'?

When would you choose one over the other?

Thanks!

OK, look at this example:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?

Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?


Refresher:

R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Varianceactual_y × R2actual_y = Variancepredicted_y

So intuitively, the more R2 is closer to 1, the more actual_y and predicted_y will have samevariance (i.e. same spread)


As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that's true:

R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]

in which:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n 

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! ... But Why?


When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.


In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

参考链接:https://stackoverflow.com/questions/24378176/python-sci-kit-learn-metrics-difference-between-r2-score-and-explained-varian

Python scikit-learn (metrics): difference between r2_score and explained_variance_score?的更多相关文章

  1. scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类 (python代码)

    scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...

  2. Scikit Learn: 在python中机器学习

    转自:http://my.oschina.net/u/175377/blog/84420#OSC_h2_23 Scikit Learn: 在python中机器学习 Warning 警告:有些没能理解的 ...

  3. (原创)(三)机器学习笔记之Scikit Learn的线性回归模型初探

    一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价 模型训练好后,度量模型拟合效果的 ...

  4. (原创)(四)机器学习笔记之Scikit Learn的Logistic回归初探

    目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优 一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...

  5. Scikit Learn

    Scikit Learn Scikit-Learn简称sklearn,基于 Python 语言的,简单高效的数据挖掘和数据分析工具,建立在 NumPy,SciPy 和 matplotlib 上.

  6. 笨办法学 Python (Learn Python The Hard Way)

    最近在看:笨办法学 Python (Learn Python The Hard Way) Contents: 译者前言 前言:笨办法更简单 习题 0: 准备工作 习题 1: 第一个程序 习题 2: 注 ...

  7. 学 Python (Learn Python The Hard Way)

    学 Python (Learn Python The Hard Way) Contents: 译者前言 前言:笨办法更简单 习题 0: 准备工作 习题 1: 第一个程序 习题 2: 注释和井号 习题 ...

  8. Python第三方库(模块)"scikit learn"以及其他库的安装

    scikit-learn是一个用于机器学习的 Python 模块. 其主页:http://scikit-learn.org/stable/. GitHub地址: https://github.com/ ...

  9. Linear Regression with Scikit Learn

    Before you read  This is a demo or practice about how to use Simple-Linear-Regression in scikit-lear ...

随机推荐

  1. Redis主从同步原理-SYNC【转】

    和MySQL主从复制的原因一样,Redis虽然读取写入的速度都特别快,但是也会产生读压力特别大的情况.为了分担读压力,Redis支持主从复制,Redis的主从结构可以采用一主多从或者级联结构,下图为级 ...

  2. 关于JAVAweb的一些东西

    1.Servlet 1.Servlet访问URL映射配置 <servlet> <servlet-name>ServletDemo1</servlet-name> & ...

  3. 程序员进阶之算法练习:LeetCode专场

    欢迎大家前往腾讯云+社区,获取更多腾讯海量技术实践干货哦~ 本文由落影发表 前言 LeetCode上的题目是大公司面试常见的算法题,今天的目标是拿下5道算法题: 题目1是基于链表的大数加法,既考察基本 ...

  4. 如何做自己的服务监控?spring boot 1.x服务监控揭秘

    1.准备 下载可运行程序:http://www.mkyong.com/spring-boot/spring-boot-hello-world-example-jsp/ 2.添加服务监控依赖 <d ...

  5. Tomcat8源码笔记(三)Catalina加载过程

    之前介绍过 Catalina加载过程是Bootstrap的load调用的  Tomcat8源码笔记(二)Bootstrap启动 按照Catalina的load过程,大致如下: 接下来一步步分析加载过程 ...

  6. 初步掌握node的路由控制

    1.1.2:node.js的路由控制 1.运行原理 在1.1.1节中,提到过app.js中app.get("/",routes.index)可以用以下代码取代: app.get(& ...

  7. springboot使用遇到问题:Class “model.Address” is listed in the persistence.xml file but not mapped

    报错如下: 解决如下: 这是一个Eclipse的怪癖.我最近在创建一个禁用了JPA库配置的新JPA项目时遇到了这个问题,但是在我通过Eclipse New JPA Entity向导创建实体之前没有手动 ...

  8. C#语句 分支语句 if --- else ---

    语句是指程序命令,都是按照顺序执行的.语句在程序中的执行顺序称为“控制流”或“执行流”. 根据程序对运行时所收到的输入的响应,在程序每次运行时控制流可能有所不同. 注意,语句间的标点符号必须是英文标点 ...

  9. UDP服务器/客户端代码示例

    UDP服务器代码: #include <errno.h> #include <string.h> #include <stdlib.h> #include < ...

  10. That Nice Euler Circuit(LA3263+几何)

    That Nice Euler Circuit Time Limit:3000MS     Memory Limit:0KB     64bit IO Format:%lld & %llu D ...