I noticed that that 'r2_score' and 'explained_variance_score' are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from 'explained_variance_score'?

When would you choose one over the other?

Thanks!

OK, look at this example:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?

Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?


Refresher:

R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Varianceactual_y × R2actual_y = Variancepredicted_y

So intuitively, the more R2 is closer to 1, the more actual_y and predicted_y will have samevariance (i.e. same spread)


As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that's true:

R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]

in which:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n 

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! ... But Why?


When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.


In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

参考链接:https://stackoverflow.com/questions/24378176/python-sci-kit-learn-metrics-difference-between-r2-score-and-explained-varian

Python scikit-learn (metrics): difference between r2_score and explained_variance_score?的更多相关文章

  1. scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类 (python代码)

    scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...

  2. Scikit Learn: 在python中机器学习

    转自:http://my.oschina.net/u/175377/blog/84420#OSC_h2_23 Scikit Learn: 在python中机器学习 Warning 警告:有些没能理解的 ...

  3. (原创)(三)机器学习笔记之Scikit Learn的线性回归模型初探

    一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价 模型训练好后,度量模型拟合效果的 ...

  4. (原创)(四)机器学习笔记之Scikit Learn的Logistic回归初探

    目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优 一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...

  5. Scikit Learn

    Scikit Learn Scikit-Learn简称sklearn,基于 Python 语言的,简单高效的数据挖掘和数据分析工具,建立在 NumPy,SciPy 和 matplotlib 上.

  6. 笨办法学 Python (Learn Python The Hard Way)

    最近在看:笨办法学 Python (Learn Python The Hard Way) Contents: 译者前言 前言:笨办法更简单 习题 0: 准备工作 习题 1: 第一个程序 习题 2: 注 ...

  7. 学 Python (Learn Python The Hard Way)

    学 Python (Learn Python The Hard Way) Contents: 译者前言 前言:笨办法更简单 习题 0: 准备工作 习题 1: 第一个程序 习题 2: 注释和井号 习题 ...

  8. Python第三方库(模块)"scikit learn"以及其他库的安装

    scikit-learn是一个用于机器学习的 Python 模块. 其主页:http://scikit-learn.org/stable/. GitHub地址: https://github.com/ ...

  9. Linear Regression with Scikit Learn

    Before you read  This is a demo or practice about how to use Simple-Linear-Regression in scikit-lear ...

随机推荐

  1. Rpc简单入门

    RPC这个概念大家都应该很熟悉了,这里不在累述了:使用场景可以参考这篇,本篇主要分享下Thrift和Grpc在.Net Core环境下使用入门.Thirft或者Grps 都支持跨语言.跨平台的Rpc框 ...

  2. CentOS 6.5静态IP的设置(NAT和桥接联网方式都适用)

    不多说,直接上干货! 为了方便,用Xshell来.并将IP设置为静态的.因为,在CentOS里,若不对其IP进行静态设置的话,则每次开机,其IP都是动态变化的,这样会给后续工作带来麻烦.为此,我们需将 ...

  3. js精度溢出解决方案

    一般参数值不能超过16位.如果超出16都是用0替代,导致我们查询不到自己想要的结果. 遇到此问题我们做如下修改 自己写属性 原始的: <a href="javascript:void( ...

  4. Enumerable转换为DataTable

    今天在项目组公共类库中发现一个 Enumerable类型转换为DataTable,写的挺精简的,拿出来跟大家共享一下. using System; using System.Collections.G ...

  5. ES启动报错最大进程数太少

    [--16T18::,][INFO ][o.e.b.BootstrapChecks ] [node-] bound or publishing to a non-loopback address, e ...

  6. IdentityServer4 中文文档 -2- (简介)相关术语

    IdentityServer4 中文文档 -2- (简介)相关术语 原文:http://docs.identityserver.io/en/release/intro/terminology.html ...

  7. windows上使用tensorboard

    因为我的环境变量设置的不是python3.5,所以走了一些弯路. 启动tensorboard后,graphs里总是什么都没有 最后再stackoverflow里找到答案 https://stackov ...

  8. Git Extensions system.invalidoperationexception尚未提供文件名,因此无法启动进程

    根据别人的博客按照步骤安装,地址如下:http://www.cnblogs.com/sorex/archive/2011/08/10/2132359.html 但是安装Git Extensions后生 ...

  9. Java的XML解析

    XML:(eXtensible Markup Language) 可扩展标记语言 是一种数据格式,用于存储和传输数据 声明一个xml文件 <?xml version="1.0" ...

  10. Eclipse配置MyBatis的xml自动提示【转】

    如果使用eclipse中,再写mybatis的xml文件的时候,没有提示,用“Alt+/”,不能把代码用快捷键敲出来: 可以试试以下几种方法: 第一种方法: 1.1:打开配置文件,按住Ctrl键,并且 ...