Python scikit-learn (metrics): difference between r2_score and explained_variance_score?
I noticed that that 'r2_score' and 'explained_variance_score' are both build-in sklearn.metrics methods for regression problems.
I was always under the impression that r2_score is the percent variance explained by the model. How is it different from 'explained_variance_score'?
When would you choose one over the other?
Thanks!
OK, look at this example:
In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015
So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?
Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).
However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?
Refresher:
R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.
You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:
Varianceactual_y × R2actual_y = Variancepredicted_y
So intuitively, the more R2 is closer to 1, the more actual_y and predicted_y will have samevariance (i.e. same spread)
As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that's true:
R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual] Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]
in which:
Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n
So, obviously the only difference is that we are subtracting the Mean Error from the first formula! ... But Why?
When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!
The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.
In Summary:
If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.
Python scikit-learn (metrics): difference between r2_score and explained_variance_score?的更多相关文章
- scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类 (python代码)
scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...
- Scikit Learn: 在python中机器学习
转自:http://my.oschina.net/u/175377/blog/84420#OSC_h2_23 Scikit Learn: 在python中机器学习 Warning 警告:有些没能理解的 ...
- (原创)(三)机器学习笔记之Scikit Learn的线性回归模型初探
一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价 模型训练好后,度量模型拟合效果的 ...
- (原创)(四)机器学习笔记之Scikit Learn的Logistic回归初探
目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优 一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...
- Scikit Learn
Scikit Learn Scikit-Learn简称sklearn,基于 Python 语言的,简单高效的数据挖掘和数据分析工具,建立在 NumPy,SciPy 和 matplotlib 上.
- 笨办法学 Python (Learn Python The Hard Way)
最近在看:笨办法学 Python (Learn Python The Hard Way) Contents: 译者前言 前言:笨办法更简单 习题 0: 准备工作 习题 1: 第一个程序 习题 2: 注 ...
- 学 Python (Learn Python The Hard Way)
学 Python (Learn Python The Hard Way) Contents: 译者前言 前言:笨办法更简单 习题 0: 准备工作 习题 1: 第一个程序 习题 2: 注释和井号 习题 ...
- Python第三方库(模块)"scikit learn"以及其他库的安装
scikit-learn是一个用于机器学习的 Python 模块. 其主页:http://scikit-learn.org/stable/. GitHub地址: https://github.com/ ...
- Linear Regression with Scikit Learn
Before you read This is a demo or practice about how to use Simple-Linear-Regression in scikit-lear ...
随机推荐
- 一个关于margin-top的问题
两个 此时内部div的样式为 当我把margin选中 如图所示: 我想要的效果是子div离父div有一个20px的间隙,但显然现在不是我想要的结果, 然后就开始查资料: 这个“问题”……它是CSS2. ...
- 3.spring环境搭建
1. 导入jar 1.1 四个核心包一个日志包(commons-logging)
- mysql 开发进阶篇系列 40 mysql日志之二进制日志下以及查询日志
一.binlog 二进制其它选项 在二进制日志记录了数据的变化过程,对于数据的完整性和安全性起着非常重要作用.在mysql中还提供了一些其它参数选项,来进行更小粒度的管理. 1.1 binlog-do ...
- UNPIVOT
UNPIVOT UNPIVOT则相反,把数据从列旋转到行 SELECT * INTO product_vlues FROM ( SELECT NAME , ...
- spring framework核心框架体系结构(转载)
作者:Dreawer 很多人都在用spring开发java项目,但是配置maven依赖的时候并不能明确要配置哪些spring的jar,经常是胡乱添加一堆,编译或运行报错就继续配置jar依赖,导致spr ...
- python 装饰方法
def _concurrent(func): @wraps(func) # 加入这个的目的是保持原来方法的属性 def arg_wrapper(self, *args, **kwargs): try: ...
- EOS商业落地利器:多签名操作与应用
eos主网上线在即,它之所以能受到各方青睐,主要是看中了它在未来商业应用落地的潜力.在这期间,完善的账户与权限系统是必要条件. 关键字:eos,账户,钱包,权限,多重签名,eosio.msig,pro ...
- Python机器学习笔记:深入学习Keras中Sequential模型及方法
Sequential 序贯模型 序贯模型是函数式模型的简略版,为最简单的线性.从头到尾的结构顺序,不分叉,是多个网络层的线性堆叠. Keras实现了很多层,包括core核心层,Convolution卷 ...
- Angular2入门:TypeScript的类型 - let , var, const
一.let 二.const
- JavaWeb学习(三十)———— 数据库连接池
一.应用程序直接获取数据库连接的缺点 用户每次请求都需要向数据库获得链接,而数据库创建连接通常需要消耗相对较大的资源,创建时间也较长.假设网站一天10万访问量,数据库服务器就需要创建10万次连接,极大 ...